All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose'
@ 2022-08-01 21:14 Victoria Dye via GitGitGadget
  2022-08-01 21:14 ` [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test Victoria Dye via GitGitGadget
                   ` (9 more replies)
  0 siblings, 10 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye

As part of the preparation for moving Scalar out of 'contrib/' and into Git,
this series moves the functionality of 'scalar diagnose' into a new option
('--diagnose') for 'git bugreport'. This change further aligns Scalar with
the objective [1] of having it only contain functionality and settings that
benefit large Git repositories, but not all repositories. The diagnostics
reported by 'scalar diagnose' relevant for investigating issues in any Git
repository, so generating them should be part of a "normal" Git builtin.

An alternative implementation considered was creating a new 'git diagnose'
builtin, but the new command would end up duplicating much of
'builtin/bugreport.c'. Although that issue could be overcome with
refactoring, I didn't see a major UX benefit of 'git diagnose' vs 'git
bugreport --diagnose', so I went with the latter, simpler approach.

Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
'git bugreport --diagnose', it is not being deprecated in this series.
Although deprecation -> removal could be a future cleanup effort, 'scalar
diagnose' is kept around for now as an alias for users already accustomed to
using it in 'scalar'.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/pull.1275.v2.git.1657584367.gitgitgadget@gmail.com/

Victoria Dye (7):
  scalar: use "$GIT_UNZIP" in 'scalar diagnose' test
  builtin/bugreport.c: create '--diagnose' option
  builtin/bugreport.c: avoid size_t overflow
  builtin/bugreport.c: add directory to archiver more gently
  builtin/bugreport.c: add '--no-report' option
  scalar: use 'git bugreport --diagnose' in 'scalar diagnose'
  scalar: update technical doc roadmap

 Documentation/git-bugreport.txt    |  17 +-
 Documentation/technical/scalar.txt |   9 +-
 builtin/bugreport.c                | 302 ++++++++++++++++++++++++++++-
 contrib/scalar/scalar.c            | 271 +-------------------------
 contrib/scalar/t/t9099-scalar.sh   |   8 +-
 t/t0091-bugreport.sh               |  29 +++
 6 files changed, 358 insertions(+), 278 deletions(-)


base-commit: 23b219f8e3f2adfb0441e135f0a880e6124f766c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1310%2Fvdye%2Fscalar%2Fgeneralize-diagnose-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1310/vdye/scalar/generalize-diagnose-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1310
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 21:46   ` Junio C Hamano
  2022-08-01 21:14 ` [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
tests on systems where 'unzip' is not in the system path.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/t/t9099-scalar.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 10b1172a8aa..fac86a57550 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -109,14 +109,14 @@ test_expect_success UNZIP 'scalar diagnose' '
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-	unzip -v "$zip_path" &&
+	"$GIT_UNZIP" -v "$zip_path" &&
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
-	unzip -p "$zip_path" diagnostics.log >out &&
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
-	unzip -p "$zip_path" packs-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
 	grep "$(pwd)/.git/objects" out &&
-	unzip -p "$zip_path" objects-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
 	grep "^Total: [1-9]" out
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
  2022-08-01 21:14 ` [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 22:16   ` Junio C Hamano
  2022-08-02  2:17   ` Ævar Arnfjörð Bjarmason
  2022-08-01 21:14 ` [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow Victoria Dye via GitGitGadget
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a '--diagnose' option for 'git bugreport' to collect additional
information about the repository and write it to a zipped archive.

The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
could be useful when diagnosing issues in any Git repository.

Note that, while this patch appears large, it is mostly copied directly out
of 'scalar.c'. Specifically, the functions

- dir_file_stats_objects()
- dir_file_stats()
- count_files()
- loose_objs_stats()
- add_directory_to_archiver()
- get_disk_info()

are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()'
function is a mostly unmodified copy of 'cmd_diagnose()', with the primary
changes being that 'zip_path' is an input and "Enlistment root" is corrected
to "Repository root" in the logs.

The remainder of the patch is made up of adding the '--diagnose' option to
'cmd_bugreport()' (including generation of the archive's 'zip_path'),
updating documentation, and adding a test. Note that the test is
'test_expect_failure' due to bugs in the original 'scalar diagnose'. These
will be fixed in subsequent patches.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-bugreport.txt |  11 +-
 builtin/bugreport.c             | 282 +++++++++++++++++++++++++++++++-
 t/t0091-bugreport.sh            |  20 +++
 3 files changed, 309 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
index d8817bf3cec..b55658bc287 100644
--- a/Documentation/git-bugreport.txt
+++ b/Documentation/git-bugreport.txt
@@ -8,7 +8,7 @@ git-bugreport - Collect information for user to file a bug report
 SYNOPSIS
 --------
 [verse]
-'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+'git bugreport' [<options>]
 
 DESCRIPTION
 -----------
@@ -31,6 +31,9 @@ The following information is captured automatically:
  - A list of enabled hooks
  - $SHELL
 
+Additional information may be gathered into a separate zip archive using the
+`--diagnose` option.
+
 This tool is invoked via the typical Git setup process, which means that in some
 cases, it might not be able to launch - for example, if a relevant config file
 is unreadable. In this kind of scenario, it may be helpful to manually gather
@@ -49,6 +52,12 @@ OPTIONS
 	named 'git-bugreport-<formatted suffix>'. This should take the form of a
 	strftime(3) format string; the current local time will be used.
 
+--diagnose::
+	Create a zip archive of information about the repository including logs
+	and certain statistics describing the data shape of the repository. The
+	archive is written to the same output directory as the bug report and is
+	named 'git-diagnostics-<formatted suffix>'.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 9de32bc96e7..35b1fc48bf1 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -5,6 +5,10 @@
 #include "compat/compiler.h"
 #include "hook.h"
 #include "hook-list.h"
+#include "dir.h"
+#include "object-store.h"
+#include "packfile.h"
+#include "archive.h"
 
 
 static void get_system_info(struct strbuf *sys_info)
@@ -59,7 +63,7 @@ static void get_populated_hooks(struct strbuf *hook_info, int nongit)
 }
 
 static const char * const bugreport_usage[] = {
-	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
+	N_("git bugreport [<options>]"),
 	NULL
 };
 
@@ -91,6 +95,259 @@ static void get_header(struct strbuf *buf, const char *title)
 	strbuf_addf(buf, "\n\n[%s]\n", title);
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
+static int add_directory_to_archiver(struct strvec *archiver_args,
+				     const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
+static int create_diagnostics_archive(struct strbuf *zip_path)
+{
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	struct strbuf buf = STRBUF_INIT;
+	int res = 0;
+
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path->buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Repository root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+			"Diagnostics complete.\n"
+			"All of the gathered info is captured in '%s'\n",
+			zip_path->buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 int cmd_bugreport(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf buffer = STRBUF_INIT;
@@ -98,16 +355,20 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	int report = -1;
 	time_t now = time(NULL);
 	struct tm tm;
+	int diagnose = 0;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	const char *user_relative_path = NULL;
 	char *prefixed_filename;
+	size_t output_path_len;
 
 	const struct option bugreport_options[] = {
+		OPT_BOOL(0, "diagnose", &diagnose,
+			 N_("generate a diagnostics zip archive")),
 		OPT_STRING('o', "output-directory", &option_output, N_("path"),
-			   N_("specify a destination for the bugreport file")),
+			   N_("specify a destination for the bugreport file(s)")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
-			   N_("specify a strftime format suffix for the filename")),
+			   N_("specify a strftime format suffix for the filename(s)")),
 		OPT_END()
 	};
 
@@ -119,6 +380,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 					    option_output ? option_output : "");
 	strbuf_addstr(&report_path, prefixed_filename);
 	strbuf_complete(&report_path, '/');
+	output_path_len = report_path.len;
 
 	strbuf_addstr(&report_path, "git-bugreport-");
 	strbuf_addftime(&report_path, option_suffix, localtime_r(&now, &tm), 0, 0);
@@ -133,6 +395,20 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 		    report_path.buf);
 	}
 
+	/* Prepare diagnostics, if requested */
+	if (diagnose) {
+		struct strbuf zip_path = STRBUF_INIT;
+		strbuf_add(&zip_path, report_path.buf, output_path_len);
+		strbuf_addstr(&zip_path, "git-diagnostics-");
+		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+		strbuf_addstr(&zip_path, ".zip");
+
+		if (create_diagnostics_archive(&zip_path))
+			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
+
+		strbuf_release(&zip_path);
+	}
+
 	/* Prepare the report contents */
 	get_bug_template(&buffer);
 
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
index 08f5fe9caef..3cf983aa67f 100755
--- a/t/t0091-bugreport.sh
+++ b/t/t0091-bugreport.sh
@@ -78,4 +78,24 @@ test_expect_success 'indicates populated hooks' '
 	test_cmp expect actual
 '
 
+test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose -o report -s test >out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	grep "Available space" out &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
  2022-08-01 21:14 ` [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test Victoria Dye via GitGitGadget
  2022-08-01 21:14 ` [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 22:18   ` Junio C Hamano
  2022-08-02  2:03   ` Ævar Arnfjörð Bjarmason
  2022-08-01 21:14 ` [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently Victoria Dye via GitGitGadget
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Avoid size_t overflow when reporting the available disk space in
'get_disk_info' by casting the block size and available block count to
'uint64_t' before multiplying them. Without this change, 'st_mult' would
(correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
bytes of available space.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/bugreport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 35b1fc48bf1..720889a37ad 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
 	}
 
 	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
 	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
 	strbuf_release(&buf);
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-08-01 21:14 ` [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 22:22   ` Junio C Hamano
  2022-08-01 21:14 ` [PATCH 5/7] builtin/bugreport.c: add '--no-report' option Victoria Dye via GitGitGadget
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

If a directory added to the '--diagnose' archiver does not exist, warn and
return 0 from 'add_directory_to_archiver()' rather than failing with a fatal
error. This handles a failure edge case where the '.git/logs' has not yet
been created when running 'git bugreport --diagnose', but extends to any
situation where a directory may be missing in the '.git' dir.

Now, when a directory is missing a warning is captured in the diagnostic
logs. This provides a user with more complete information than if 'git
bugreport' simply failed with an error.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/bugreport.c  |  8 +++++++-
 t/t0091-bugreport.sh | 11 ++++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 720889a37ad..dea11f91386 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -176,12 +176,18 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 				     const char *path, int recurse)
 {
 	int at_root = !*path;
-	DIR *dir = opendir(at_root ? "." : path);
+	DIR *dir;
 	struct dirent *e;
 	struct strbuf buf = STRBUF_INIT;
 	size_t len;
 	int res = 0;
 
+	if (!file_exists(at_root ? "." : path)) {
+		warning(_("directory '%s' does not exist, will not be archived"), path);
+		return 0;
+	}
+
+	dir = opendir(at_root ? "." : path);
 	if (!dir)
 		return error_errno(_("could not open directory '%s'"), path);
 
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
index 3cf983aa67f..e9db89ef2c8 100755
--- a/t/t0091-bugreport.sh
+++ b/t/t0091-bugreport.sh
@@ -78,7 +78,7 @@ test_expect_success 'indicates populated hooks' '
 	test_cmp expect actual
 '
 
-test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
+test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
 	test_when_finished rm -rf report &&
 
 	git bugreport --diagnose -o report -s test >out &&
@@ -98,4 +98,13 @@ test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
 	grep "^Total: [0-9][0-9]*" out
 '
 
+test_expect_success '--diagnose warns when archived dir does not exist' '
+	test_when_finished rm -rf report &&
+
+	# Remove logs - not guaranteed to exist
+	rm -rf .git/logs &&
+	git bugreport --diagnose -o report -s test 2>err &&
+	grep "directory .\.git/logs. does not exist, will not be archived" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 5/7] builtin/bugreport.c: add '--no-report' option
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-08-01 21:14 ` [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 22:31   ` Junio C Hamano
  2022-08-01 21:14 ` [PATCH 6/7] scalar: use 'git bugreport --diagnose' in 'scalar diagnose' Victoria Dye via GitGitGadget
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a '--no-report' option to 'git bugreport' to avoid writing the
'git-bugreport-<suffix>.txt' file. This gives users the option of creating
only the diagnostic archive with '--diagnose' and mirroring the behavior of
the original 'scalar diagnose' as closely as possible.

If a user specifies '--no-report' *without* also specifying '--diagnose',
the 'git bugreport' operation is a no-op; a warning message is printed and
the command returns with a non-error exit code.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-bugreport.txt |  6 ++++++
 builtin/bugreport.c             | 16 +++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
index b55658bc287..5eae7a4f950 100644
--- a/Documentation/git-bugreport.txt
+++ b/Documentation/git-bugreport.txt
@@ -58,6 +58,12 @@ OPTIONS
 	archive is written to the same output directory as the bug report and is
 	named 'git-diagnostics-<formatted suffix>'.
 
+--no-report::
+	Do not write out a 'git-bugreport-<suffix>.txt' file. This option is
+	intended for use with `--diagnose` when only the diagnostic archive is
+	needed. If `--no-report` is used without `--diagnose`, `git bugreport`
+	is a no-op.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index dea11f91386..5ecff70276a 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -361,7 +361,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	int report = -1;
 	time_t now = time(NULL);
 	struct tm tm;
-	int diagnose = 0;
+	int diagnose = 0, skip_summary = 0;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	const char *user_relative_path = NULL;
@@ -371,6 +371,8 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	const struct option bugreport_options[] = {
 		OPT_BOOL(0, "diagnose", &diagnose,
 			 N_("generate a diagnostics zip archive")),
+		OPT_BOOL(0, "no-report", &skip_summary,
+			 N_("do not create a summary report")),
 		OPT_STRING('o', "output-directory", &option_output, N_("path"),
 			   N_("specify a destination for the bugreport file(s)")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
@@ -381,6 +383,11 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, bugreport_options,
 			     bugreport_usage, 0);
 
+	if (skip_summary && !diagnose) {
+		warning(_("Nothing to do!"));
+		return 0;
+	}
+
 	/* Prepare the path to put the result */
 	prefixed_filename = prefix_filename(prefix,
 					    option_output ? option_output : "");
@@ -415,6 +422,13 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 		strbuf_release(&zip_path);
 	}
 
+	if (skip_summary) {
+		free(prefixed_filename);
+		strbuf_release(&buffer);
+		strbuf_release(&report_path);
+		return 0;
+	}
+
 	/* Prepare the report contents */
 	get_bug_template(&buffer);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 6/7] scalar: use 'git bugreport --diagnose' in 'scalar diagnose'
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-08-01 21:14 ` [PATCH 5/7] builtin/bugreport.c: add '--no-report' option Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 21:14 ` [PATCH 7/7] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Replace implementation of 'scalar diagnose' with an internal invocation of
'git bugreport --diagnose --no-report'. The '--diagnose' option of 'git
bugreport' was implemented to mirror what 'scalar diagnose' does, taking
most of its code directly from 'scalar.c'. Remove the now-duplicate code in
'scalar.c' and have 'scalar diagnose' call 'git bugreport' to create the
diagnostics archive.

This introduces two (minor) changes to the output of 'scalar diagnose':
changing "Enlistment root" to "Repository root" in 'diagnostics.log'
("enlistment root" was inaccurate anyway, as the reported path always
pointed to the root of the repository), and changing the prefix of the zip
archive from 'scalar_' to 'git-diagnostics-'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 271 ++--------------------------------------
 1 file changed, 7 insertions(+), 264 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 97e71fe19cd..7b1953605bd 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,7 +11,6 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "archive.h"
 #include "object-store.h"
 
 /*
@@ -262,99 +261,6 @@ static int unregister_dir(void)
 	return res;
 }
 
-static int add_directory_to_archiver(struct strvec *archiver_args,
-					  const char *path, int recurse)
-{
-	int at_root = !*path;
-	DIR *dir = opendir(at_root ? "." : path);
-	struct dirent *e;
-	struct strbuf buf = STRBUF_INIT;
-	size_t len;
-	int res = 0;
-
-	if (!dir)
-		return error_errno(_("could not open directory '%s'"), path);
-
-	if (!at_root)
-		strbuf_addf(&buf, "%s/", path);
-	len = buf.len;
-	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
-
-	while (!res && (e = readdir(dir))) {
-		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
-			continue;
-
-		strbuf_setlen(&buf, len);
-		strbuf_addstr(&buf, e->d_name);
-
-		if (e->d_type == DT_REG)
-			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
-		else if (e->d_type != DT_DIR)
-			warning(_("skipping '%s', which is neither file nor "
-				  "directory"), buf.buf);
-		else if (recurse &&
-			 add_directory_to_archiver(archiver_args,
-						   buf.buf, recurse) < 0)
-			res = -1;
-	}
-
-	closedir(dir);
-	strbuf_release(&buf);
-	return res;
-}
-
-#ifndef WIN32
-#include <sys/statvfs.h>
-#endif
-
-static int get_disk_info(struct strbuf *out)
-{
-#ifdef WIN32
-	struct strbuf buf = STRBUF_INIT;
-	char volume_name[MAX_PATH], fs_name[MAX_PATH];
-	DWORD serial_number, component_length, flags;
-	ULARGE_INTEGER avail2caller, total, avail;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
-		error(_("could not determine free disk size for '%s'"),
-		      buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_setlen(&buf, offset_1st_component(buf.buf));
-	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
-				   &serial_number, &component_length, &flags,
-				   fs_name, sizeof(fs_name))) {
-		error(_("could not get info for '%s'"), buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, avail2caller.QuadPart);
-	strbuf_addch(out, '\n');
-	strbuf_release(&buf);
-#else
-	struct strbuf buf = STRBUF_INIT;
-	struct statvfs stat;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (statvfs(buf.buf, &stat) < 0) {
-		error_errno(_("could not determine free disk size for '%s'"),
-			    buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
-	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
-	strbuf_release(&buf);
-#endif
-	return 0;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -595,83 +501,6 @@ cleanup:
 	return res;
 }
 
-static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
-				   const char *file_name, void *data)
-{
-	struct strbuf *buf = data;
-	struct stat st;
-
-	if (!stat(full_path, &st))
-		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
-			    (uintmax_t)st.st_size);
-}
-
-static int dir_file_stats(struct object_directory *object_dir, void *data)
-{
-	struct strbuf *buf = data;
-
-	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
-
-	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
-				  data);
-
-	return 0;
-}
-
-static int count_files(char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count = 0;
-
-	if (!dir)
-		return 0;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
-			count++;
-
-	closedir(dir);
-	return count;
-}
-
-static void loose_objs_stats(struct strbuf *buf, const char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count;
-	int total = 0;
-	unsigned char c;
-	struct strbuf count_path = STRBUF_INIT;
-	size_t base_path_len;
-
-	if (!dir)
-		return;
-
-	strbuf_addstr(buf, "Object directory stats for ");
-	strbuf_add_absolute_path(buf, path);
-	strbuf_addstr(buf, ":\n");
-
-	strbuf_add_absolute_path(&count_path, path);
-	strbuf_addch(&count_path, '/');
-	base_path_len = count_path.len;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) &&
-		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
-		    !hex_to_bytes(&c, e->d_name, 1)) {
-			strbuf_setlen(&count_path, base_path_len);
-			strbuf_addstr(&count_path, e->d_name);
-			total += (count = count_files(count_path.buf));
-			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
-		}
-
-	strbuf_addf(buf, "Total: %d loose objects", total);
-
-	strbuf_release(&count_path);
-	closedir(dir);
-}
-
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -681,106 +510,20 @@ static int cmd_diagnose(int argc, const char **argv)
 		N_("scalar diagnose [<enlistment>]"),
 		NULL
 	};
-	struct strbuf zip_path = STRBUF_INIT;
-	struct strvec archiver_args = STRVEC_INIT;
-	char **argv_copy = NULL;
-	int stdout_fd = -1, archiver_fd = -1;
-	time_t now = time(NULL);
-	struct tm tm;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf diagnostics_path = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
 			     usage, 0);
 
-	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
-
-	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
-	strbuf_addftime(&zip_path,
-			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
-	strbuf_addstr(&zip_path, ".zip");
-	switch (safe_create_leading_directories(zip_path.buf)) {
-	case SCLD_EXISTS:
-	case SCLD_OK:
-		break;
-	default:
-		error_errno(_("could not create directory for '%s'"),
-			    zip_path.buf);
-		goto diagnose_cleanup;
-	}
-	stdout_fd = dup(1);
-	if (stdout_fd < 0) {
-		res = error_errno(_("could not duplicate stdout"));
-		goto diagnose_cleanup;
-	}
-
-	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
-	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
-		res = error_errno(_("could not redirect output"));
-		goto diagnose_cleanup;
-	}
-
-	init_zip_archiver();
-	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
-	get_version_info(&buf, 1);
-
-	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
-	get_disk_info(&buf);
-	write_or_die(stdout_fd, buf.buf, buf.len);
-	strvec_pushf(&archiver_args,
-		     "--add-virtual-file=diagnostics.log:%.*s",
-		     (int)buf.len, buf.buf);
+	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_path);
+	strbuf_addstr(&diagnostics_path, "/.scalarDiagnostics");
 
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
-	dir_file_stats(the_repository->objects->odb, &buf);
-	foreach_alt_odb(dir_file_stats, &buf);
-	strvec_push(&archiver_args, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
-	loose_objs_stats(&buf, ".git/objects");
-	strvec_push(&archiver_args, buf.buf);
-
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
-		goto diagnose_cleanup;
-
-	strvec_pushl(&archiver_args, "--prefix=",
-		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
-
-	/* `write_archive()` modifies the `argv` passed to it. Let it. */
-	argv_copy = xmemdupz(archiver_args.v,
-			     sizeof(char *) * archiver_args.nr);
-	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
-			    the_repository, NULL, 0);
-	if (res) {
-		error(_("failed to write archive"));
-		goto diagnose_cleanup;
-	}
-
-	if (!res)
-		fprintf(stderr, "\n"
-		       "Diagnostics complete.\n"
-		       "All of the gathered info is captured in '%s'\n",
-		       zip_path.buf);
-
-diagnose_cleanup:
-	if (archiver_fd >= 0) {
-		close(1);
-		dup2(stdout_fd, 1);
-	}
-	free(argv_copy);
-	strvec_clear(&archiver_args);
-	strbuf_release(&zip_path);
-	strbuf_release(&buf);
+	if (run_git("bugreport", "--diagnose", "--no-report",
+		    "-s", "%Y%m%d_%H%M%S", "-o", diagnostics_path.buf, NULL) < 0)
+		res = -1;
 
+	strbuf_release(&diagnostics_path);
 	return res;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 7/7] scalar: update technical doc roadmap
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-08-01 21:14 ` [PATCH 6/7] scalar: use 'git bugreport --diagnose' in 'scalar diagnose' Victoria Dye via GitGitGadget
@ 2022-08-01 21:14 ` Victoria Dye via GitGitGadget
  2022-08-01 21:34 ` [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Junio C Hamano
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-01 21:14 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, johannes.schindelin, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update the Scalar roadmap to reflect the completion of generalizing 'scalar
diagnose' into 'git bugreport --diagnose'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/technical/scalar.txt | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
index 08bc09c225a..1fa2f6d5f91 100644
--- a/Documentation/technical/scalar.txt
+++ b/Documentation/technical/scalar.txt
@@ -84,6 +84,9 @@ series have been accepted:
 
 - `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
 
+- `scalar-generalize-diagnose`: Move the functionality of `scalar diagnose`
+  into `git bugreport --diagnose`.
+
 Roughly speaking (and subject to change), the following series are needed to
 "finish" this initial version of Scalar:
 
@@ -91,12 +94,6 @@ Roughly speaking (and subject to change), the following series are needed to
   and implement `scalar help`. At the end of this series, Scalar should be
   feature-complete from the perspective of a user.
 
-- Generalize features not specific to Scalar: In the spirit of making Scalar
-  configure only what is needed for large repo performance, move common
-  utilities into other parts of Git. Some of this will be internal-only, but one
-  major change will be generalizing `scalar diagnose` for use with any Git
-  repository.
-
 - Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
   `git`, including updates to build and install it with the rest of Git. This
   change will incorporate Scalar into the Git CI and test framework, as well as
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose'
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-08-01 21:14 ` [PATCH 7/7] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
@ 2022-08-01 21:34 ` Junio C Hamano
  2022-08-02  2:49 ` Ævar Arnfjörð Bjarmason
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
  9 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-01 21:34 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> As part of the preparation for moving Scalar out of 'contrib/' and into Git,
> this series moves the functionality of 'scalar diagnose' into a new option
> ('--diagnose') for 'git bugreport'. This change further aligns Scalar with
> the objective [1] of having it only contain functionality and settings that
> benefit large Git repositories, but not all repositories. The diagnostics
> reported by 'scalar diagnose' relevant for investigating issues in any Git
> repository, so generating them should be part of a "normal" Git builtin.

;-)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test
  2022-08-01 21:14 ` [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test Victoria Dye via GitGitGadget
@ 2022-08-01 21:46   ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-01 21:46 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
> 'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
> tests on systems where 'unzip' is not in the system path.

Makes sense.  It makes it more in line with how a handful of tests
in t/ already use the zip archive.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option
  2022-08-01 21:14 ` [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-01 22:16   ` Junio C Hamano
  2022-08-02 15:40     ` Victoria Dye
  2022-08-02  2:17   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-01 22:16 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Create a '--diagnose' option for 'git bugreport' to collect additional
> information about the repository and write it to a zipped archive.
>
> The "diagnose" functionality was originally implemented for Scalar in
> aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
> diagnostics gathered are not specific to Scalar-cloned repositories and
> could be useful when diagnosing issues in any Git repository.
>
> Note that, while this patch appears large, it is mostly copied directly out
> of 'scalar.c'. Specifically, the functions
>
> - dir_file_stats_objects()
> - dir_file_stats()
> - count_files()
> - loose_objs_stats()
> - add_directory_to_archiver()
> - get_disk_info()

Yup.  As this does not "move" code across from older place to the
new home, it takes a bit of processing to verify the above claim,
but

 $ git blame -C -C -C -s -b master.. -- builtin/bugreport.c

shows that these are largely verbatim copies.

> +#ifndef WIN32
> +#include <sys/statvfs.h>
> +#endif
> +
> +static int get_disk_info(struct strbuf *out)
> +{
> +#ifdef WIN32
> +	struct strbuf buf = STRBUF_INIT;
> +...
> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> +	strbuf_humanise_bytes(out, avail2caller.QuadPart);
> +...
> +#else
> +...
> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> +	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
> +...
> +#endif
> +	return 0;
> +}

As a proper part of Git, this part should probably be factored out
so that a platform specific helper function, implemented in compat/
layer, grabs "available disk space" number in off_t and the caller
of the above function becomes

	strbuf_realpath(&dir, ".", 1);
	strbuf_addf(out, "Available space on '%s:' ", dir.buf);
	strbuf_humanise_bytes(out, get_disk_size(dir.buf));

or something, without having to have #ifdef droppings.

> +static int create_diagnostics_archive(struct strbuf *zip_path)
> +{

Large part of this function is also lifted from scalar, and it looks
OK.  One thing I noticed is that "res" is explicitly initialized to
0, but given that the way the code is structured to use the "we
process sequencially in successful case, and branch out by 'goto'
immediately when we see a breakage" pattern, it may be better to
initialize it to -1 (i.e. assume error), or even better, leave it
uninitialized (i.e. let the compiler notice if a jump to cleanup is
made without setting res appropriately).

> +diagnose_cleanup:
> +	if (archiver_fd >= 0) {
> +		close(1);
> +		dup2(stdout_fd, 1);
> +	}
> +	free(argv_copy);
> +	strvec_clear(&archiver_args);
> +	strbuf_release(&buf);

Hmph, stdout_fd is a copy of the file descriptor 1 that was saved
away at the beginning.  Then archiver_fd was created to write into
the zip archive, and during the bulk of the function it was dup2'ed
to the file descriptor 1, to make anything written to the latter
appear in the zip output.

When we successfully opened archive_fd but failed to dup2(), we may
close a wrong file desciptor 1 here, but we recover from that by
using the saved-away stdout_fd, so we'd be OK.  If we did dup2(),
then we would be OK, too.

I am wondering if archiver_fd itself is leaking here, though.

Also, if we failed to open archiver_fd, then we have stdout_fd
leaking here, I suspect.

> +	return res;
> +}

Other than that, looks good to me.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-01 21:14 ` [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow Victoria Dye via GitGitGadget
@ 2022-08-01 22:18   ` Junio C Hamano
  2022-08-02 16:26     ` Victoria Dye
  2022-08-02  2:03   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-01 22:18 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Avoid size_t overflow when reporting the available disk space in
> 'get_disk_info' by casting the block size and available block count to
> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
> bytes of available space.

Sane.  But shouldn't the cast be to off_t, which is what
strbuf_humanise_bytes() takes anyway?

>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/bugreport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 35b1fc48bf1..720889a37ad 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>  	}
>  
>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
>  	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
>  	strbuf_release(&buf);
>  #endif

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently
  2022-08-01 21:14 ` [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently Victoria Dye via GitGitGadget
@ 2022-08-01 22:22   ` Junio C Hamano
  2022-08-02 15:43     ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-01 22:22 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  	int at_root = !*path;
> -	DIR *dir = opendir(at_root ? "." : path);
> +	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t len;
>  	int res = 0;
>  
> +	if (!file_exists(at_root ? "." : path)) {
> +		warning(_("directory '%s' does not exist, will not be archived"), path);
> +		return 0;
> +	}
> +
> +	dir = opendir(at_root ? "." : path);
>  	if (!dir)
>  		return error_errno(_("could not open directory '%s'"), path);

I am not sure if TOCTTOU is how we want to be more gentle.  Do we
rather want to do something like this

	dir = opendir(...);
	if (!dir) {
		if (errno == ENOENT) {
			warning(_("not archiving missing directory '%s'", path);
		        return 0;
		}
                return error_errno(_("cannot open directory '%s'"), path);
	}

or am I missing something subtle?

Thanks.

> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> index 3cf983aa67f..e9db89ef2c8 100755
> --- a/t/t0091-bugreport.sh
> +++ b/t/t0091-bugreport.sh
> @@ -78,7 +78,7 @@ test_expect_success 'indicates populated hooks' '
>  	test_cmp expect actual
>  '
>  
> -test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
> +test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
>  	test_when_finished rm -rf report &&
>  
>  	git bugreport --diagnose -o report -s test >out &&
> @@ -98,4 +98,13 @@ test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
>  	grep "^Total: [0-9][0-9]*" out
>  '
>  
> +test_expect_success '--diagnose warns when archived dir does not exist' '
> +	test_when_finished rm -rf report &&
> +
> +	# Remove logs - not guaranteed to exist
> +	rm -rf .git/logs &&
> +	git bugreport --diagnose -o report -s test 2>err &&
> +	grep "directory .\.git/logs. does not exist, will not be archived" err
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/7] builtin/bugreport.c: add '--no-report' option
  2022-08-01 21:14 ` [PATCH 5/7] builtin/bugreport.c: add '--no-report' option Victoria Dye via GitGitGadget
@ 2022-08-01 22:31   ` Junio C Hamano
  2022-08-02 19:46     ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-01 22:31 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Add a '--no-report' option to 'git bugreport' to avoid writing the
> 'git-bugreport-<suffix>.txt' file. This gives users the option of creating
> only the diagnostic archive with '--diagnose' and mirroring the behavior of
> the original 'scalar diagnose' as closely as possible.
>
> If a user specifies '--no-report' *without* also specifying '--diagnose',
> the 'git bugreport' operation is a no-op; a warning message is printed and
> the command returns with a non-error exit code.

I think this makes sense from scalar side, and I have no objection
against this "--no-report" feature existing, but I wonder if those
who want to send report may want to have a handy way to tell the
command to "include" the diag archive in their report (instead of
creating separate report and diagnose files, having to attach two
files to their message).  Perhaps that is unneeded, or perhaps that
comes in later patches in the series, I dunno.

> +--no-report::
> +	Do not write out a 'git-bugreport-<suffix>.txt' file. This option is
> +	intended for use with `--diagnose` when only the diagnostic archive is
> +	needed. If `--no-report` is used without `--diagnose`, `git bugreport`
> +	is a no-op.

I wonder if thinking it this way may make the UI simpler to explain.

The "git bugreport" is capable of showing report and diagnose with
these two orthogonal options, i.e.

	--report::	writes bugreport file
	--diagnose::	writes diagnostic archive

And for backward compatibility reasons, the command pretends as if
you gave it "--report" when you run it without either.

That way, "bugreport --diagnose" will just show diagnostic archive
without having to pass "--no-report".  There is no need for "nothing
to do", either.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-01 21:14 ` [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow Victoria Dye via GitGitGadget
  2022-08-01 22:18   ` Junio C Hamano
@ 2022-08-02  2:03   ` Ævar Arnfjörð Bjarmason
  2022-08-02 16:26     ` Victoria Dye
  1 sibling, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02  2:03 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Avoid size_t overflow when reporting the available disk space in
> 'get_disk_info' by casting the block size and available block count to
> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
> bytes of available space.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/bugreport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 35b1fc48bf1..720889a37ad 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>  	}
>  
>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);

Doesn't this remove the overflow guard on 64 bit systems to support
those 32 bit systems?

I also don't tthink it's correct that this would "correctly
report...". Before this we were simply assuming that "size_t" and
"unsigned long" & "fsblkcnt_t" would all yield the same thing.

But I don't think per [1] and [2] that POSIX is giving us any guarantees
in that regard, even on 32 bit systems, but perhaps it's a reasonable
assumption in practice.

1. https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/statvfs.h.html
2. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option
  2022-08-01 21:14 ` [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
  2022-08-01 22:16   ` Junio C Hamano
@ 2022-08-02  2:17   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02  2:17 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]
>  Documentation/git-bugreport.txt |  11 +-
>  builtin/bugreport.c             | 282 +++++++++++++++++++++++++++++++-
>  t/t0091-bugreport.sh            |  20 +++
>  3 files changed, 309 insertions(+), 4 deletions(-)
> [...]

Maybe it's not easy in this case, but I wonder if this series can't be
re-arranged in a way that more directly benefits from the diff move
detection.

E.g. if we moved the unchanged functions to a new repo-disk-usage.c or
something we could have an intermediate step of having both use that,
and then going forward would work towards a better lib/built-in
split-up...

> --- a/Documentation/git-bugreport.txt
> +++ b/Documentation/git-bugreport.txt
> @@ -8,7 +8,7 @@ git-bugreport - Collect information for user to file a bug report
>  SYNOPSIS
>  --------
>  [verse]
> -'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
> +'git bugreport' [<options>]
> [...]
>  static const char * const bugreport_usage[] = {
> -	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
> +	N_("git bugreport [<options>]"),
>  	NULL
>  };

We have some built-ins that punt on re-listing the synopsis in the -h
output, but we always list the full usage in the SYNOPSIS.

I think both of these hunks should be dropped, instead we should
(presumably) add a "git bugreport --diagnose" to this, and if it
combines (or not) with other options, let's update both accordingly.

> [...]
> +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
>

Is the "scalar-diagnose" here a mistake?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose'
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (7 preceding siblings ...)
  2022-08-01 21:34 ` [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Junio C Hamano
@ 2022-08-02  2:49 ` Ævar Arnfjörð Bjarmason
  2022-08-02 19:48   ` Victoria Dye
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
  9 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02  2:49 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:

> [...] I didn't see a major UX benefit of 'git diagnose' vs 'git
> bugreport --diagnose', so I went with the latter, simpler approach.

I really wanted to like this, but I find the end result here really
confusing from a UX perspective.

You can now run "git bugreport --diagnose", which creates a giant *.zip
file to go along with your *.txt, but your *.txt makes no reference to
it.

Should you ... attach it to your bug report to this mailing list, do
something else?

The documentation doesn't offer much in the way of hints, other than
suggesting (with --no-report) that this --diagnose is for something
entirely different (and that's how "scalar" uses it).

I know what it's really for after reading this series, but for "git
bugreport" in particular we should be really careful about not making
the UX confusing.

The generated *.zip contains some really deep info about your repo (and
not just metadata, e.g. copies of the index, various logs etc.), someone
e.g. in a proprietary setting really doesn't want to be sharing that
info.

So I would like to see real integration into "git bugreport", i.e. for
us to smartly report more repository metrics, e.g. approx number of
loose objects, the sort of state "__git_ps1" might report, etc.

But I think the end-state here makes things much more confusing for
users.

> An alternative implementation considered was creating a new 'git diagnose'
> builtin, but the new command would end up duplicating much of
> 'builtin/bugreport.c'.

It seems we always "return" from cmd_bugreport() quite quickly, and we
basically only share the code to create the output directory. Just
duplicating or sharing that seems like a much better approach for now
than creating the above UX confusion.

Note that you can also share code between multiple built-ins, even in
the same file (see e.g. builtin/{checkout,log}.c). So we could even
share something like the safe_create_leading_directories() calling code
in bugreport.c without libifying it.

> Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
> 'git bugreport --diagnose', it is not being deprecated in this series.
> Although deprecation -> removal could be a future cleanup effort, 'scalar
> diagnose' is kept around for now as an alias for users already accustomed to
> using it in 'scalar'.

We don't have a "make install" to get a "scalar" onto user's systems
yet, do we really need to worry about those users?

Or is this a reference to the out-of-tree version of "scalar", not
git.git's?


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option
  2022-08-01 22:16   ` Junio C Hamano
@ 2022-08-02 15:40     ` Victoria Dye
  0 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye @ 2022-08-02 15:40 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Create a '--diagnose' option for 'git bugreport' to collect additional
>> information about the repository and write it to a zipped archive.
>>
>> The "diagnose" functionality was originally implemented for Scalar in
>> aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
>> diagnostics gathered are not specific to Scalar-cloned repositories and
>> could be useful when diagnosing issues in any Git repository.
>>
>> Note that, while this patch appears large, it is mostly copied directly out
>> of 'scalar.c'. Specifically, the functions
>>
>> - dir_file_stats_objects()
>> - dir_file_stats()
>> - count_files()
>> - loose_objs_stats()
>> - add_directory_to_archiver()
>> - get_disk_info()
> 
> Yup.  As this does not "move" code across from older place to the
> new home, it takes a bit of processing to verify the above claim,
> but
> 
>  $ git blame -C -C -C -s -b master.. -- builtin/bugreport.c
> 
> shows that these are largely verbatim copies.
> 
>> +#ifndef WIN32
>> +#include <sys/statvfs.h>
>> +#endif
>> +
>> +static int get_disk_info(struct strbuf *out)
>> +{
>> +#ifdef WIN32
>> +	struct strbuf buf = STRBUF_INIT;
>> +...
>> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> +	strbuf_humanise_bytes(out, avail2caller.QuadPart);
>> +...
>> +#else
>> +...
>> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> +	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>> +...
>> +#endif
>> +	return 0;
>> +}
> 
> As a proper part of Git, this part should probably be factored out
> so that a platform specific helper function, implemented in compat/
> layer, grabs "available disk space" number in off_t and the caller
> of the above function becomes
> 
> 	strbuf_realpath(&dir, ".", 1);
> 	strbuf_addf(out, "Available space on '%s:' ", dir.buf);
> 	strbuf_humanise_bytes(out, get_disk_size(dir.buf));
> 
> or something, without having to have #ifdef droppings.
> 

This makes sense, I'll probably follow an approach similar to what was done
with 'compat/compiler.h' in [1] (unless adding to 'git-compat-util.h' would
be more appropriate?).

[1] https://lore.kernel.org/git/20200416211807.60811-6-emilyshaffer@google.com/

>> +static int create_diagnostics_archive(struct strbuf *zip_path)
>> +{
> 
> Large part of this function is also lifted from scalar, and it looks
> OK.  One thing I noticed is that "res" is explicitly initialized to
> 0, but given that the way the code is structured to use the "we
> process sequencially in successful case, and branch out by 'goto'
> immediately when we see a breakage" pattern, it may be better to
> initialize it to -1 (i.e. assume error), or even better, leave it
> uninitialized (i.e. let the compiler notice if a jump to cleanup is
> made without setting res appropriately).
> 

I'll go with the "uninitialized" approach in the re-roll; I like the
simplicity of relying on the compiler to determine if it's unassigned.

>> +diagnose_cleanup:
>> +	if (archiver_fd >= 0) {
>> +		close(1);
>> +		dup2(stdout_fd, 1);
>> +	}
>> +	free(argv_copy);
>> +	strvec_clear(&archiver_args);
>> +	strbuf_release(&buf);
> 
> Hmph, stdout_fd is a copy of the file descriptor 1 that was saved
> away at the beginning.  Then archiver_fd was created to write into
> the zip archive, and during the bulk of the function it was dup2'ed
> to the file descriptor 1, to make anything written to the latter
> appear in the zip output.
> 
> When we successfully opened archive_fd but failed to dup2(), we may
> close a wrong file desciptor 1 here, but we recover from that by
> using the saved-away stdout_fd, so we'd be OK.  If we did dup2(),
> then we would be OK, too.
> 
> I am wondering if archiver_fd itself is leaking here, though.
> 
> Also, if we failed to open archiver_fd, then we have stdout_fd
> leaking here, I suspect.
> 

If I'm not mistaken, both 'archiver_fd' and 'stdout_fd' are always leaked if
they're successfully created (they're never 'close()'d). There's also an
unnecessary check for 'archiver_fd < 0', since 'xopen()' will die if it
can't open the file. And, as you mentioned, the wrong file descriptor 1 is
closed if the 'dup2()' of 'archiver_fd' fails.

I'll clean this up for V2, thanks.

>> +	return res;
>> +}
> 
> Other than that, looks good to me.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently
  2022-08-01 22:22   ` Junio C Hamano
@ 2022-08-02 15:43     ` Victoria Dye
  0 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye @ 2022-08-02 15:43 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  	int at_root = !*path;
>> -	DIR *dir = opendir(at_root ? "." : path);
>> +	DIR *dir;
>>  	struct dirent *e;
>>  	struct strbuf buf = STRBUF_INIT;
>>  	size_t len;
>>  	int res = 0;
>>  
>> +	if (!file_exists(at_root ? "." : path)) {
>> +		warning(_("directory '%s' does not exist, will not be archived"), path);
>> +		return 0;
>> +	}
>> +
>> +	dir = opendir(at_root ? "." : path);
>>  	if (!dir)
>>  		return error_errno(_("could not open directory '%s'"), path);
> 
> I am not sure if TOCTTOU is how we want to be more gentle.  Do we
> rather want to do something like this
> 
> 	dir = opendir(...);
> 	if (!dir) {
> 		if (errno == ENOENT) {
> 			warning(_("not archiving missing directory '%s'", path);
> 		        return 0;
> 		}
>                 return error_errno(_("cannot open directory '%s'"), path);
> 	}
> 
> or am I missing something subtle?
> 

The "gentleness" was meant to be a reference only to the error -> warning
change, the TOCTTOU change was just a miss by me. I'll fix it in the next
version, thanks!

> Thanks.
> 
>> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
>> index 3cf983aa67f..e9db89ef2c8 100755
>> --- a/t/t0091-bugreport.sh
>> +++ b/t/t0091-bugreport.sh
>> @@ -78,7 +78,7 @@ test_expect_success 'indicates populated hooks' '
>>  	test_cmp expect actual
>>  '
>>  
>> -test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
>> +test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
>>  	test_when_finished rm -rf report &&
>>  
>>  	git bugreport --diagnose -o report -s test >out &&
>> @@ -98,4 +98,13 @@ test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
>>  	grep "^Total: [0-9][0-9]*" out
>>  '
>>  
>> +test_expect_success '--diagnose warns when archived dir does not exist' '
>> +	test_when_finished rm -rf report &&
>> +
>> +	# Remove logs - not guaranteed to exist
>> +	rm -rf .git/logs &&
>> +	git bugreport --diagnose -o report -s test 2>err &&
>> +	grep "directory .\.git/logs. does not exist, will not be archived" err
>> +'
>> +
>>  test_done


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-01 22:18   ` Junio C Hamano
@ 2022-08-02 16:26     ` Victoria Dye
  2022-08-02 20:51       ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-02 16:26 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Avoid size_t overflow when reporting the available disk space in
>> 'get_disk_info' by casting the block size and available block count to
>> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
>> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
>> bytes of available space.
> 
> Sane.  But shouldn't the cast be to off_t, which is what
> strbuf_humanise_bytes() takes anyway?
> 

I chose 'uint64_t' to mimic 'throughput_string()' [1], but the signed
'off_t' is a better choice given its use in 'strbuf_humanise_bytes()'.

On a related note, while writing this I made the (unsubstantiated)
assumption that 'off_t' would be a 64-bit int, even on 32-bit systems. Your
comment prompted me to confirm that assumption; while 'off_t' is not always
guaranteed to be an int64 by default [2], Git is compiled with '#define
_FILE_OFFSET_BITS 64' [3] so 'off_t' is equivalent to 'off64_t'.

I'll update the casts to 'off_t' in V2. Thanks!

[1] https://lore.kernel.org/git/20171110173956.25105-4-newren@gmail.com/
[2] https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-off_005ft
[3] https://lore.kernel.org/git/7vr6smc1de.fsf@assigned-by-dhcp.cox.net/

>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  builtin/bugreport.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
>> index 35b1fc48bf1..720889a37ad 100644
>> --- a/builtin/bugreport.c
>> +++ b/builtin/bugreport.c
>> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>>  	}
>>  
>>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
>>  	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
>>  	strbuf_release(&buf);
>>  #endif


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-02  2:03   ` Ævar Arnfjörð Bjarmason
@ 2022-08-02 16:26     ` Victoria Dye
  2022-08-03 12:25       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-02 16:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Avoid size_t overflow when reporting the available disk space in
>> 'get_disk_info' by casting the block size and available block count to
>> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
>> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
>> bytes of available space.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  builtin/bugreport.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
>> index 35b1fc48bf1..720889a37ad 100644
>> --- a/builtin/bugreport.c
>> +++ b/builtin/bugreport.c
>> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>>  	}
>>  
>>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
> 
> Doesn't this remove the overflow guard on 64 bit systems to support
> those 32 bit systems?
> 

It does, but the total disk space available on a system should be able to
fit into a 64-bit integer. I considered adding an explicit
'unsigned_mult_overflows', but decided against it because it's almost
certainly overkill for such an implausible edge case.

> I also don't tthink it's correct that this would "correctly
> report...". Before this we were simply assuming that "size_t" and
> "unsigned long" & "fsblkcnt_t" would all yield the same thing.
> 

The point I was making is that, if your 'size_t' is 32 bits, but you have
more than ~4GB of disk space available on your system, the result of the
multiplication will overflow 'size_t'. So, 'st_mult' failing because it
detects an overflow is "correct", rather than e.g. a false positive.

> But I don't think per [1] and [2] that POSIX is giving us any guarantees
> in that regard, even on 32 bit systems, but perhaps it's a reasonable
> assumption in practice.
> 
> 1. https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/statvfs.h.html
> 2. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/7] builtin/bugreport.c: add '--no-report' option
  2022-08-01 22:31   ` Junio C Hamano
@ 2022-08-02 19:46     ` Victoria Dye
  0 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye @ 2022-08-02 19:46 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, emilyshaffer

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Add a '--no-report' option to 'git bugreport' to avoid writing the
>> 'git-bugreport-<suffix>.txt' file. This gives users the option of creating
>> only the diagnostic archive with '--diagnose' and mirroring the behavior of
>> the original 'scalar diagnose' as closely as possible.
>>
>> If a user specifies '--no-report' *without* also specifying '--diagnose',
>> the 'git bugreport' operation is a no-op; a warning message is printed and
>> the command returns with a non-error exit code.
> 
> I think this makes sense from scalar side, and I have no objection
> against this "--no-report" feature existing, but I wonder if those
> who want to send report may want to have a handy way to tell the
> command to "include" the diag archive in their report (instead of
> creating separate report and diagnose files, having to attach two
> files to their message).  Perhaps that is unneeded, or perhaps that
> comes in later patches in the series, I dunno.
> 

I tried finding where in the documentation there are instructions on sending
a bug report to the mailing list, but didn't see anything (otherwise, I'd
add some info on '--diagnose' there). Maybe Emily would know?

If instructions like that don't exist, I'll update the command documentation
here to clarify that '--diagnose' generates an attachment that includes more
complete repository information to aid in debugging.

>> +--no-report::
>> +	Do not write out a 'git-bugreport-<suffix>.txt' file. This option is
>> +	intended for use with `--diagnose` when only the diagnostic archive is
>> +	needed. If `--no-report` is used without `--diagnose`, `git bugreport`
>> +	is a no-op.
> 
> I wonder if thinking it this way may make the UI simpler to explain.
> 
> The "git bugreport" is capable of showing report and diagnose with
> these two orthogonal options, i.e.
> 
> 	--report::	writes bugreport file
> 	--diagnose::	writes diagnostic archive
> 
> And for backward compatibility reasons, the command pretends as if
> you gave it "--report" when you run it without either.
> 
> That way, "bugreport --diagnose" will just show diagnostic archive
> without having to pass "--no-report".  There is no need for "nothing
> to do", either.
> 

I like the simplicity of this, but I'd imagine that a user would want to
generate diagnostics *with* a report more often than without one. The cases
I can think of for "standalone diagnostics" are: internally in 'scalar
diagnose', someone requesting more info after an initial bug report, or a
user looking into something on their own.

Maybe I could replace '--no-report' with '--diagnostics-only'? Then the
three modes of use would be:

- 'git bugreport': report only (most common usage)
- 'git bugreport --diagnose': report + diagnostics
- 'git bugreport --diagnostics-only': diagnostics only (least common usage)

It would eliminate the need for "nothing to do" while making it (I think?)
clearer to a user why you would want to use any of these options.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose'
  2022-08-02  2:49 ` Ævar Arnfjörð Bjarmason
@ 2022-08-02 19:48   ` Victoria Dye
  2022-08-03 12:34     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-02 19:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
> 
>> [...] I didn't see a major UX benefit of 'git diagnose' vs 'git
>> bugreport --diagnose', so I went with the latter, simpler approach.
> 
> I really wanted to like this, but I find the end result here really
> confusing from a UX perspective.
> 
> You can now run "git bugreport --diagnose", which creates a giant *.zip
> file to go along with your *.txt, but your *.txt makes no reference to
> it.
> 
> Should you ... attach it to your bug report to this mailing list, do
> something else?
> 
> The documentation doesn't offer much in the way of hints, other than
> suggesting (with --no-report) that this --diagnose is for something
> entirely different (and that's how "scalar" uses it).
> 
> I know what it's really for after reading this series, but for "git
> bugreport" in particular we should be really careful about not making
> the UX confusing.
> 
> The generated *.zip contains some really deep info about your repo (and
> not just metadata, e.g. copies of the index, various logs etc.), someone
> e.g. in a proprietary setting really doesn't want to be sharing that
> info.
> 
> So I would like to see real integration into "git bugreport", i.e. for
> us to smartly report more repository metrics, e.g. approx number of
> loose objects, the sort of state "__git_ps1" might report, etc.
> 
> But I think the end-state here makes things much more confusing for
> users.
> 

The "confusing UX" you describe here doesn't seem to be an inherent issue
with the implementation (nor is it as insurmountable as you're implying), it
largely appears to be an issue of under-documentation. I'll improve that in
V2 [1], but I want clarify what I was/am going for here as well.

In the context of a bug report, the diagnostics are intended to be used as
supplemental information to aid in debugging (i.e., attached with the report
in the email to the list). They're especially valuable when a bug reporter
isn't very familiar with Git internals and they can't reproduce the issue. A
lot of bugs can be investigated without those diagnostics, though, which is
why '--diagnose' isn't "on" by default.

There are also valid use-cases (beyond 'scalar diagnose') for '--no-report':
someone requests more information after looking into an already-generated
report, or a developer wants to investigate a bug on their own and use the
diagnostics as a "starting point" to guide more in-depth debugging. 

As for the proprietary data issue, I'd be open to having an option to
configure which diagnostics a user wants (either something like '--diagnose
<level>' or a separate option entirely). I'm pretty indifferent on the UI,
though, so I'll defer to other contributors on 1) if they want that feature,
and 2) what they think that should look like.

[1] https://lore.kernel.org/git/f3235afe-25cc-21a4-fc35-56e35d6be0ce@github.com/

>> An alternative implementation considered was creating a new 'git diagnose'
>> builtin, but the new command would end up duplicating much of
>> 'builtin/bugreport.c'.
> 
> It seems we always "return" from cmd_bugreport() quite quickly, and we
> basically only share the code to create the output directory. Just
> duplicating or sharing that seems like a much better approach for now
> than creating the above UX confusion.
> 
> Note that you can also share code between multiple built-ins, even in
> the same file (see e.g. builtin/{checkout,log}.c). So we could even
> share something like the safe_create_leading_directories() calling code
> in bugreport.c without libifying it.
> 

You deleted the part where I addressed this suggestion directly:

> Although that issue could be overcome with refactoring, I didn't see a
> major UX benefit of 'git diagnose' vs 'git bugreport --diagnose', so I
> went with the latter, simpler approach.

And, in the process of writing down my thoughts on the UX above, I've become
more convinced that including '--diagnose' in 'git bugreport' is the better
way to present this functionality to users.

>> Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
>> 'git bugreport --diagnose', it is not being deprecated in this series.
>> Although deprecation -> removal could be a future cleanup effort, 'scalar
>> diagnose' is kept around for now as an alias for users already accustomed to
>> using it in 'scalar'.
> 
> We don't have a "make install" to get a "scalar" onto user's systems
> yet, do we really need to worry about those users?
> 
> Or is this a reference to the out-of-tree version of "scalar", not
> git.git's?
> 

In practice, it's the "out-of-tree Scalar" users that would care the most.
That said, with Scalar in the Git tree (albeit 'contrib/' and not built by
default), I think it's reasonable to want to avoid breaking changes if
possible. The continued existence of 'scalar diagnose' wouldn't really be
hurting anyone anyway, so there's no pressing need to deprecate it now.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-02 16:26     ` Victoria Dye
@ 2022-08-02 20:51       ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-02 20:51 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee, johannes.schindelin

Victoria Dye <vdye@github.com> writes:

> I chose 'uint64_t' to mimic 'throughput_string()' [1], but the signed
> 'off_t' is a better choice given its use in 'strbuf_humanise_bytes()'.
>
> On a related note, while writing this I made the (unsubstantiated)
> assumption that 'off_t' would be a 64-bit int, even on 32-bit systems. Your
> comment prompted me to confirm that assumption; while 'off_t' is not always
> guaranteed to be an int64 by default [2], Git is compiled with '#define
> _FILE_OFFSET_BITS 64' [3] so 'off_t' is equivalent to 'off64_t'.

Offset into a single file can be smaller than the size of the whole
disk, after all, so from that point of view, use of off_t in
"humanise_bytes" API may be something we would want to fix
eventually to reduce confusion.

I do not particularly mind casting up to uint64, especially if that
matches the code lifted from scalar.  As long as our longer term
plan is to update strbuf_humanise_bytes() to take the widest
possible unsigned integer, what we do right now would not make that
much of a difference.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow
  2022-08-02 16:26     ` Victoria Dye
@ 2022-08-03 12:25       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-03 12:25 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee, johannes.schindelin


On Tue, Aug 02 2022, Victoria Dye wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Avoid size_t overflow when reporting the available disk space in
>>> 'get_disk_info' by casting the block size and available block count to
>>> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
>>> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
>>> bytes of available space.
>>>
>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>> ---
>>>  builtin/bugreport.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
>>> index 35b1fc48bf1..720889a37ad 100644
>>> --- a/builtin/bugreport.c
>>> +++ b/builtin/bugreport.c
>>> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>>>  	}
>>>  
>>>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>>> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>>> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
>> 
>> Doesn't this remove the overflow guard on 64 bit systems to support
>> those 32 bit systems?
>> 
>
> It does, but the total disk space available on a system should be able to
> fit into a 64-bit integer. I considered adding an explicit
> 'unsigned_mult_overflows', but decided against it because it's almost
> certainly overkill for such an implausible edge case.

Yeah it's probably overkill, and maybe this is good as-is & we don't
need to worry here.

But that's quite different from what the patch says, it's not "avoid
size_t overflow" but e.g.:

	bugreport.c: don't do size_t overflow check before casting to 32bit

	It's a hassle to support the check on 32 bit systems, and we
	don't think this is something we'll run into in practice [...]

Perhaps?

>> I also don't tthink it's correct that this would "correctly
>> report...". Before this we were simply assuming that "size_t" and
>> "unsigned long" & "fsblkcnt_t" would all yield the same thing.
>> 
>
> The point I was making is that, if your 'size_t' is 32 bits, but you have
> more than ~4GB of disk space available on your system, the result of the
> multiplication will overflow 'size_t'. So, 'st_mult' failing because it
> detects an overflow is "correct", rather than e.g. a false positive.

I think it would be useful to document these assumptions in the commit
message, POSIX just says "blkcnt_t and off_t shall be signed integer
types", and "size_t shall be an unsigned integer type.".

Do other bits of the standard(s) that I've missed say that off_t's
signed type must be double the width of size_t's unsigned, or is it one
of those things that's not standardized but can be relied on in
practice?

We have a related assertion in 37ee680d9b9 (http.postbuffer: allow full
range of ssize_t values, 2017-04-11) (xcurl_off_t()). Perhaps you want
to do something similar to sanity check your assumptions here?

1. https://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/types.h.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose'
  2022-08-02 19:48   ` Victoria Dye
@ 2022-08-03 12:34     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-03 12:34 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee, johannes.schindelin


On Tue, Aug 02 2022, Victoria Dye wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> [...] I didn't see a major UX benefit of 'git diagnose' vs 'git
>>> bugreport --diagnose', so I went with the latter, simpler approach.
>> 
>> I really wanted to like this, but I find the end result here really
>> confusing from a UX perspective.
>> 
>> You can now run "git bugreport --diagnose", which creates a giant *.zip
>> file to go along with your *.txt, but your *.txt makes no reference to
>> it.
>> 
>> Should you ... attach it to your bug report to this mailing list, do
>> something else?
>> 
>> The documentation doesn't offer much in the way of hints, other than
>> suggesting (with --no-report) that this --diagnose is for something
>> entirely different (and that's how "scalar" uses it).
>> 
>> I know what it's really for after reading this series, but for "git
>> bugreport" in particular we should be really careful about not making
>> the UX confusing.
>> 
>> The generated *.zip contains some really deep info about your repo (and
>> not just metadata, e.g. copies of the index, various logs etc.), someone
>> e.g. in a proprietary setting really doesn't want to be sharing that
>> info.
>> 
>> So I would like to see real integration into "git bugreport", i.e. for
>> us to smartly report more repository metrics, e.g. approx number of
>> loose objects, the sort of state "__git_ps1" might report, etc.
>> 
>> But I think the end-state here makes things much more confusing for
>> users.
>> 
>
> The "confusing UX" you describe here doesn't seem to be an inherent issue
> with the implementation (nor is it as insurmountable as you're implying),

I'm not implying or saying that it's insurmountable.

I think in principle having such a mode in "git bugreport" would make
sense.

But the UX here seems to be an afterthought. So I wonder if we shouldn't
hold off on it and just have a new *--helper or something instead.

> it
> largely appears to be an issue of under-documentation. I'll improve that in
> V2 [1], but I want clarify what I was/am going for here as well.
>
> In the context of a bug report, the diagnostics are intended to be used as
> supplemental information to aid in debugging (i.e., attached with the report
> in the email to the list).

Per https://lore.kernel.org/git/?q=n%3Azip we don't block *.zip
attachments, but we have fairly low size limits (to the point of
blocking some large patches).

E.g. e448263716f (po/git.pot: this is now a generated file, 2022-05-26)
(https://lore.kernel.org/git/20220526145035.18958-7-worldhello.net@gmail.com/)
was blocked, and it's just over 500k.

Aside from the data-sharing issues that seems like a good addition to
git-bugreport, i.e. tell the user if the attachment would be blocked due
to its size...

> They're especially valuable when a bug reporter
> isn't very familiar with Git internals and they can't reproduce the issue. A
> lot of bugs can be investigated without those diagnostics, though, which is
> why '--diagnose' isn't "on" by default.
>
> There are also valid use-cases (beyond 'scalar diagnose') for '--no-report':
> someone requests more information after looking into an already-generated
> report, or a developer wants to investigate a bug on their own and use the
> diagnostics as a "starting point" to guide more in-depth debugging. 

Yes, it's useful in a lot of cases. I'm just saying that we really need
to bridge the gap of telling the user what they should be doing with
this new file....

> As for the proprietary data issue, I'd be open to having an option to
> configure which diagnostics a user wants (either something like '--diagnose
> <level>' or a separate option entirely). I'm pretty indifferent on the UI,
> though, so I'll defer to other contributors on 1) if they want that feature,
> and 2) what they think that should look like.

I think it's really important that a bug report feature doesn't submit
private data by default. The name of a "--diagnose" option heavily
implies some aggregate metrics etc.

We then attach what to the user are opaque binary files, but which
contain even the file contants of the repository (index files).

I think scalar was developer for "internal" use-cases where such sharing
wasn't an issue for anyone, but I don't want anyone to get in trouble
because they shared their proprietary source code on the ML while trying
to file a bug report, so defaults really matter.

Especially because if there's one thing we've learned from bug reports &
questions to this & git-users is that the one thing you can rely on is
that users routinely don't carefully scrutinize documentation before
trying out various features...

> [1] https://lore.kernel.org/git/f3235afe-25cc-21a4-fc35-56e35d6be0ce@github.com/
>
>>> An alternative implementation considered was creating a new 'git diagnose'
>>> builtin, but the new command would end up duplicating much of
>>> 'builtin/bugreport.c'.
>> 
>> It seems we always "return" from cmd_bugreport() quite quickly, and we
>> basically only share the code to create the output directory. Just
>> duplicating or sharing that seems like a much better approach for now
>> than creating the above UX confusion.
>> 
>> Note that you can also share code between multiple built-ins, even in
>> the same file (see e.g. builtin/{checkout,log}.c). So we could even
>> share something like the safe_create_leading_directories() calling code
>> in bugreport.c without libifying it.
>> 
>
> You deleted the part where I addressed this suggestion directly:

Yes, the "Although that..." sentence, but I commented on the UX
trade-off elsewhere.

>> Although that issue could be overcome with refactoring, I didn't see a
>> major UX benefit of 'git diagnose' vs 'git bugreport --diagnose', so I
>> went with the latter, simpler approach.
>
> And, in the process of writing down my thoughts on the UX above, I've become
> more convinced that including '--diagnose' in 'git bugreport' is the better
> way to present this functionality to users.

We're on the same page there, we're just discussing if/how to make the
end-to-end process clearer to users.

I don't think it's clear in v1, and is sorely missing something like the
discussion we have around "ANONYMIZING" in git-fast-export(1), and we
really should have a "safe by default". Everything I'm noting here would
be addressed by e.g.:

 * The *.txt output would say "you can additionally attach a diagnostic
   *.zip" file etc. etc., noting if/when the user would do that.

 * We'd have that --diagnostics be safe by default in the way
   --anonymize is, e.g. including stats about number of refs & the like,
   not their contents.

 * We could also have e.g. "--diagnostics --no-anonymize", or
   "--diagnostics=full" or whatever, which would be some approximation
   of the current output.

>>> Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
>>> 'git bugreport --diagnose', it is not being deprecated in this series.
>>> Although deprecation -> removal could be a future cleanup effort, 'scalar
>>> diagnose' is kept around for now as an alias for users already accustomed to
>>> using it in 'scalar'.
>> 
>> We don't have a "make install" to get a "scalar" onto user's systems
>> yet, do we really need to worry about those users?
>> 
>> Or is this a reference to the out-of-tree version of "scalar", not
>> git.git's?
>> 
>
> In practice, it's the "out-of-tree Scalar" users that would care the most.
> That said, with Scalar in the Git tree (albeit 'contrib/' and not built by
> default), I think it's reasonable to want to avoid breaking changes if
> possible. The continued existence of 'scalar diagnose' wouldn't really be
> hurting anyone anyway, so there's no pressing need to deprecate it now.

I'm fine with keeping it, but just found the juxtaposition of doing that
& previous discussions about scalar to be surprising.

When it was submitted (from my recollection, I see
Documentation/technical/scalar.txt is rather light on the topic) the
idea was (and this is from my own recollection, I haven't dug up ML
references) that scalar would be "going away", and that the reason to
have it in-tree was to inform the design of the main "git" tooling for
the use-cases scalar is addressing.

So, a bit similar to Cogito (although that was never in-tree).

I'm fine with just keeping it as an alias for "git bugreport
--whatever", but think it would make even more sense to change the
interface a bit as we integrate it to the rest of git, and then "die()"
saying "'scalar xyz' is now 'git abc', mostly....".

Existing scalar users used to 'scalar xyz' would then try the new 'git
abc', and report if/how that fits their use-case.

Then at some point if we do that with all of scalar's features we'd be
at the point of 'git rm'-ing it, as it would have served its stated
purpose.

Just my 0.02, I think in any case having the scalar.txt doc's "Roadmap"
updated to address this would be very useful.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'
  2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                   ` (8 preceding siblings ...)
  2022-08-02  2:49 ` Ævar Arnfjörð Bjarmason
@ 2022-08-04  1:45 ` Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 01/10] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
                     ` (11 more replies)
  9 siblings, 12 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

As part of the preparation for moving Scalar out of 'contrib/' and into Git,
this series moves the functionality of 'scalar diagnose' into a new builtin
('git diagnose') and a new option ('--diagnose') for 'git bugreport'. This
change further aligns Scalar with the objective [1] of having it only
contain functionality and settings that benefit large Git repositories, but
not all repositories. The diagnostics reported by 'scalar diagnose' relevant
for investigating issues in any Git repository, so generating them should be
part of a "normal" Git builtin.

The series is organized as follows:

 * Miscellaneous fixes for the existing 'scalar diagnose' implementation
 * Moving the code for generating diagnostics into a common location in the
   Git tree
 * Implementing 'git diagnose'
 * Implementing 'git bugreport --diagnose'
 * Updating the Scalar roadmap

Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
'git bugreport --diagnose', it is not being deprecated in this series.
Although deprecation -> removal could be a future cleanup effort, 'scalar
diagnose' is kept around for now as an alias for users already accustomed to
using it in 'scalar'.


Changes since V1
================

 * Reorganized patches to fix minor issues (e.g., more gently adding
   directories to the archive) of 'scalar diagnose' in 'scalar.c', before
   the code is moved out of that file.
 * (Almost) entirely redesigned the UI for generating diagnostics. The new
   approach avoids cluttering 'git bugreport' with a mode that doesn't
   actually generate a report. Now, there are distinct options for different
   use cases: generating extra diagnostics with a bug report ('git bugreport
   --diagnose') and generating diagnostics for personal debugging/addition
   to an existing bug report ('git diagnose').
 * Moved 'get_disk_info()' into 'compat/'.
 * Moved 'create_diagnostics_archive()' into a new 'diagnose.c', as it now
   has multiple callers.
 * Updated command & option documentation to more clearly guide users on how
   to use the new options.
 * Added the '--all' (and '--diagnose=all') option to change the default
   behavior of diagnostics generation to exclude '.git' directory contents.
   For many bug reporters, this would reveal private repository contents
   they don't want to expose to the public mailing list. This has the added
   benefit of creating much smaller archives by default, which will be more
   likely to successfully send to the mailing list.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/pull.1275.v2.git.1657584367.gitgitgadget@gmail.com/

Victoria Dye (10):
  scalar-diagnose: use "$GIT_UNZIP" in test
  scalar-diagnose: avoid 32-bit overflow of size_t
  scalar-diagnose: add directory to archiver more gently
  scalar-diagnose: move 'get_disk_info()' to 'compat/'
  scalar-diagnose: move functionality to common location
  builtin/diagnose.c: create 'git diagnose' builtin
  builtin/diagnose.c: gate certain data behind '--all'
  builtin/bugreport.c: create '--diagnose' option
  scalar-diagnose: use 'git diagnose --all'
  scalar: update technical doc roadmap

 .gitignore                         |   1 +
 Documentation/git-bugreport.txt    |  18 ++
 Documentation/git-diagnose.txt     |  60 +++++++
 Documentation/technical/scalar.txt |   9 +-
 Makefile                           |   2 +
 builtin.h                          |   1 +
 builtin/bugreport.c                |  47 ++++-
 builtin/diagnose.c                 |  62 +++++++
 compat/disk.h                      |  56 ++++++
 contrib/scalar/scalar.c            | 272 +----------------------------
 contrib/scalar/t/t9099-scalar.sh   |   8 +-
 diagnose.c                         | 218 +++++++++++++++++++++++
 diagnose.h                         |   9 +
 git-compat-util.h                  |   1 +
 git.c                              |   1 +
 t/t0091-bugreport.sh               |  44 +++++
 t/t0092-diagnose.sh                |  43 +++++
 17 files changed, 575 insertions(+), 277 deletions(-)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100644 compat/disk.h
 create mode 100644 diagnose.c
 create mode 100644 diagnose.h
 create mode 100755 t/t0092-diagnose.sh


base-commit: 23b219f8e3f2adfb0441e135f0a880e6124f766c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1310%2Fvdye%2Fscalar%2Fgeneralize-diagnose-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1310/vdye/scalar/generalize-diagnose-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1310

Range-diff vs v1:

  1:  a7a766de29b !  1:  ad5b60bf11e scalar: use "$GIT_UNZIP" in 'scalar diagnose' test
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    scalar: use "$GIT_UNZIP" in 'scalar diagnose' test
     +    scalar-diagnose: use "$GIT_UNZIP" in test
      
          Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
          'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
  3:  e8abfdfa892 !  2:  7956dc24b30 builtin/bugreport.c: avoid size_t overflow
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    builtin/bugreport.c: avoid size_t overflow
     +    scalar-diagnose: avoid 32-bit overflow of size_t
      
     -    Avoid size_t overflow when reporting the available disk space in
     +    Avoid 32-bit size_t overflow when reporting the available disk space in
          'get_disk_info' by casting the block size and available block count to
     -    'uint64_t' before multiplying them. Without this change, 'st_mult' would
     -    (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
     +    'off_t' before multiplying them. Without this change, 'st_mult' would
     +    (correctly) report a size_t overflow on 32-bit systems at or exceeding 2^32
          bytes of available space.
      
     +    Note that 'off_t' is a 64-bit integer even on 32-bit systems due to the
     +    inclusion of '#define _FILE_OFFSET_BITS 64' in 'git-compat-util.h' (see
     +    b97e911643 (Support for large files on 32bit systems., 2007-02-17)).
     +
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     - ## builtin/bugreport.c ##
     -@@ builtin/bugreport.c: static int get_disk_info(struct strbuf *out)
     + ## contrib/scalar/scalar.c ##
     +@@ contrib/scalar/scalar.c: static int get_disk_info(struct strbuf *out)
       	}
       
       	strbuf_addf(out, "Available space on '%s': ", buf.buf);
      -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
     -+	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
     ++	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
       	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
       	strbuf_release(&buf);
       #endif
  4:  4bc290fbf43 !  3:  23349bfaf8f builtin/bugreport.c: add directory to archiver more gently
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    builtin/bugreport.c: add directory to archiver more gently
     +    scalar-diagnose: add directory to archiver more gently
      
     -    If a directory added to the '--diagnose' archiver does not exist, warn and
     -    return 0 from 'add_directory_to_archiver()' rather than failing with a fatal
     -    error. This handles a failure edge case where the '.git/logs' has not yet
     -    been created when running 'git bugreport --diagnose', but extends to any
     +    If a directory added to the 'scalar diagnose' archiver does not exist, warn
     +    and return 0 from 'add_directory_to_archiver()' rather than failing with a
     +    fatal error. This handles a failure edge case where the '.git/logs' has not
     +    yet been created when running 'scalar diagnose', but extends to any
          situation where a directory may be missing in the '.git' dir.
      
          Now, when a directory is missing a warning is captured in the diagnostic
     -    logs. This provides a user with more complete information than if 'git
     -    bugreport' simply failed with an error.
     +    logs. This provides a user with more complete information than if 'scalar
     +    diagnose' simply failed with an error.
      
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     - ## builtin/bugreport.c ##
     -@@ builtin/bugreport.c: static int add_directory_to_archiver(struct strvec *archiver_args,
     - 				     const char *path, int recurse)
     + ## contrib/scalar/scalar.c ##
     +@@ contrib/scalar/scalar.c: static int add_directory_to_archiver(struct strvec *archiver_args,
     + 					  const char *path, int recurse)
       {
       	int at_root = !*path;
      -	DIR *dir = opendir(at_root ? "." : path);
     @@ builtin/bugreport.c: static int add_directory_to_archiver(struct strvec *archive
       	size_t len;
       	int res = 0;
       
     -+	if (!file_exists(at_root ? "." : path)) {
     -+		warning(_("directory '%s' does not exist, will not be archived"), path);
     -+		return 0;
     -+	}
     -+
     +-	if (!dir)
      +	dir = opendir(at_root ? "." : path);
     - 	if (!dir)
     ++	if (!dir) {
     ++		if (errno == ENOENT) {
     ++			warning(_("could not archive missing directory '%s'"), path);
     ++			return 0;
     ++		}
       		return error_errno(_("could not open directory '%s'"), path);
     ++	}
       
     -
     - ## t/t0091-bugreport.sh ##
     -@@ t/t0091-bugreport.sh: test_expect_success 'indicates populated hooks' '
     - 	test_cmp expect actual
     - '
     - 
     --test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
     -+test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
     - 	test_when_finished rm -rf report &&
     - 
     - 	git bugreport --diagnose -o report -s test >out &&
     -@@ t/t0091-bugreport.sh: test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
     - 	grep "^Total: [0-9][0-9]*" out
     - '
     - 
     -+test_expect_success '--diagnose warns when archived dir does not exist' '
     -+	test_when_finished rm -rf report &&
     -+
     -+	# Remove logs - not guaranteed to exist
     -+	rm -rf .git/logs &&
     -+	git bugreport --diagnose -o report -s test 2>err &&
     -+	grep "directory .\.git/logs. does not exist, will not be archived" err
     -+'
     -+
     - test_done
     + 	if (!at_root)
     + 		strbuf_addf(&buf, "%s/", path);
  -:  ----------- >  4:  05bba1e699f scalar-diagnose: move 'get_disk_info()' to 'compat/'
  6:  4eb3c43d488 !  5:  3a0cb33c658 scalar: use 'git bugreport --diagnose' in 'scalar diagnose'
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    scalar: use 'git bugreport --diagnose' in 'scalar diagnose'
     +    scalar-diagnose: move functionality to common location
      
     -    Replace implementation of 'scalar diagnose' with an internal invocation of
     -    'git bugreport --diagnose --no-report'. The '--diagnose' option of 'git
     -    bugreport' was implemented to mirror what 'scalar diagnose' does, taking
     -    most of its code directly from 'scalar.c'. Remove the now-duplicate code in
     -    'scalar.c' and have 'scalar diagnose' call 'git bugreport' to create the
     -    diagnostics archive.
     +    Move the core functionality of 'scalar diagnose' into a new 'diagnose.[c,h]'
     +    library to prepare for new callers in the main Git tree generating
     +    diagnostic archives. These callers will be introduced in subsequent patches.
      
     -    This introduces two (minor) changes to the output of 'scalar diagnose':
     -    changing "Enlistment root" to "Repository root" in 'diagnostics.log'
     -    ("enlistment root" was inaccurate anyway, as the reported path always
     -    pointed to the root of the repository), and changing the prefix of the zip
     -    archive from 'scalar_' to 'git-diagnostics-'.
     +    While this patch appears large, it is mostly made up of moving code out of
     +    'scalar.c' and into 'diagnose.c'. Specifically, the functions
     +
     +    - dir_file_stats_objects()
     +    - dir_file_stats()
     +    - count_files()
     +    - loose_objs_stats()
     +    - add_directory_to_archiver()
     +
     +    are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()'
     +    function is a mostly identical (partial) copy of 'cmd_diagnose()', with the
     +    primary changes being that 'zip_path' is an input and "Enlistment root" is
     +    corrected to "Repository root" in the archiver log.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     + ## Makefile ##
     +@@ Makefile: LIB_OBJS += ctype.o
     + LIB_OBJS += date.o
     + LIB_OBJS += decorate.o
     + LIB_OBJS += delta-islands.o
     ++LIB_OBJS += diagnose.o
     + LIB_OBJS += diff-delta.o
     + LIB_OBJS += diff-merges.o
     + LIB_OBJS += diff-lib.o
     +
       ## contrib/scalar/scalar.c ##
      @@
       #include "dir.h"
       #include "packfile.h"
       #include "help.h"
      -#include "archive.h"
     - #include "object-store.h"
     +-#include "object-store.h"
     +-#include "compat/disk.h"
     ++#include "diagnose.h"
       
       /*
     +  * Remove the deepest subdirectory in the provided path string. Path must not
      @@ contrib/scalar/scalar.c: static int unregister_dir(void)
       	return res;
       }
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      -					  const char *path, int recurse)
      -{
      -	int at_root = !*path;
     --	DIR *dir = opendir(at_root ? "." : path);
     +-	DIR *dir;
      -	struct dirent *e;
      -	struct strbuf buf = STRBUF_INIT;
      -	size_t len;
      -	int res = 0;
      -
     --	if (!dir)
     +-	dir = opendir(at_root ? "." : path);
     +-	if (!dir) {
     +-		if (errno == ENOENT) {
     +-			warning(_("could not archive missing directory '%s'"), path);
     +-			return 0;
     +-		}
      -		return error_errno(_("could not open directory '%s'"), path);
     +-	}
      -
      -	if (!at_root)
      -		strbuf_addf(&buf, "%s/", path);
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      -	strbuf_release(&buf);
      -	return res;
      -}
     --
     --#ifndef WIN32
     --#include <sys/statvfs.h>
     --#endif
     --
     --static int get_disk_info(struct strbuf *out)
     --{
     --#ifdef WIN32
     --	struct strbuf buf = STRBUF_INIT;
     --	char volume_name[MAX_PATH], fs_name[MAX_PATH];
     --	DWORD serial_number, component_length, flags;
     --	ULARGE_INTEGER avail2caller, total, avail;
     --
     --	strbuf_realpath(&buf, ".", 1);
     --	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
     --		error(_("could not determine free disk size for '%s'"),
     --		      buf.buf);
     --		strbuf_release(&buf);
     --		return -1;
     --	}
     --
     --	strbuf_setlen(&buf, offset_1st_component(buf.buf));
     --	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
     --				   &serial_number, &component_length, &flags,
     --				   fs_name, sizeof(fs_name))) {
     --		error(_("could not get info for '%s'"), buf.buf);
     --		strbuf_release(&buf);
     --		return -1;
     --	}
     --	strbuf_addf(out, "Available space on '%s': ", buf.buf);
     --	strbuf_humanise_bytes(out, avail2caller.QuadPart);
     --	strbuf_addch(out, '\n');
     --	strbuf_release(&buf);
     --#else
     --	struct strbuf buf = STRBUF_INIT;
     --	struct statvfs stat;
     --
     --	strbuf_realpath(&buf, ".", 1);
     --	if (statvfs(buf.buf, &stat) < 0) {
     --		error_errno(_("could not determine free disk size for '%s'"),
     --			    buf.buf);
     --		strbuf_release(&buf);
     --		return -1;
     --	}
     --
     --	strbuf_addf(out, "Available space on '%s': ", buf.buf);
     --	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
     --	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
     --	strbuf_release(&buf);
     --#endif
     --	return 0;
     --}
      -
       /* printf-style interface, expects `<key>=<value>` argument */
       static int set_config(const char *fmt, ...)
     @@ contrib/scalar/scalar.c: cleanup:
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 		N_("scalar diagnose [<enlistment>]"),
       		NULL
       	};
     --	struct strbuf zip_path = STRBUF_INIT;
     + 	struct strbuf zip_path = STRBUF_INIT;
      -	struct strvec archiver_args = STRVEC_INIT;
      -	char **argv_copy = NULL;
      -	int stdout_fd = -1, archiver_fd = -1;
     --	time_t now = time(NULL);
     --	struct tm tm;
     + 	time_t now = time(NULL);
     + 	struct tm tm;
      -	struct strbuf buf = STRBUF_INIT;
     -+	struct strbuf diagnostics_path = STRBUF_INIT;
       	int res = 0;
       
       	argc = parse_options(argc, argv, NULL, options,
     - 			     usage, 0);
     - 
     --	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
     --
     --	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
     --	strbuf_addftime(&zip_path,
     --			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
     --	strbuf_addstr(&zip_path, ".zip");
     --	switch (safe_create_leading_directories(zip_path.buf)) {
     --	case SCLD_EXISTS:
     --	case SCLD_OK:
     --		break;
     --	default:
     --		error_errno(_("could not create directory for '%s'"),
     --			    zip_path.buf);
     --		goto diagnose_cleanup;
     --	}
     +@@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     + 			    zip_path.buf);
     + 		goto diagnose_cleanup;
     + 	}
      -	stdout_fd = dup(1);
      -	if (stdout_fd < 0) {
      -		res = error_errno(_("could not duplicate stdout"));
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
      -	strvec_pushf(&archiver_args,
      -		     "--add-virtual-file=diagnostics.log:%.*s",
      -		     (int)buf.len, buf.buf);
     -+	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_path);
     -+	strbuf_addstr(&diagnostics_path, "/.scalarDiagnostics");
     - 
     +-
      -	strbuf_reset(&buf);
      -	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
      -	dir_file_stats(the_repository->objects->odb, &buf);
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
      -		error(_("failed to write archive"));
      -		goto diagnose_cleanup;
      -	}
     --
     + 
      -	if (!res)
      -		fprintf(stderr, "\n"
      -		       "Diagnostics complete.\n"
      -		       "All of the gathered info is captured in '%s'\n",
      -		       zip_path.buf);
     --
     --diagnose_cleanup:
     ++	res = create_diagnostics_archive(&zip_path);
     + 
     + diagnose_cleanup:
      -	if (archiver_fd >= 0) {
      -		close(1);
      -		dup2(stdout_fd, 1);
      -	}
      -	free(argv_copy);
      -	strvec_clear(&archiver_args);
     --	strbuf_release(&zip_path);
     + 	strbuf_release(&zip_path);
      -	strbuf_release(&buf);
     -+	if (run_git("bugreport", "--diagnose", "--no-report",
     -+		    "-s", "%Y%m%d_%H%M%S", "-o", diagnostics_path.buf, NULL) < 0)
     -+		res = -1;
     - 
     -+	strbuf_release(&diagnostics_path);
     +-
       	return res;
       }
       
     +
     + ## diagnose.c (new) ##
     +@@
     ++#include "diagnose.h"
     ++#include "compat/disk.h"
     ++#include "archive.h"
     ++#include "dir.h"
     ++#include "help.h"
     ++#include "strvec.h"
     ++#include "object-store.h"
     ++#include "packfile.h"
     ++
     ++static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
     ++				   const char *file_name, void *data)
     ++{
     ++	struct strbuf *buf = data;
     ++	struct stat st;
     ++
     ++	if (!stat(full_path, &st))
     ++		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
     ++			    (uintmax_t)st.st_size);
     ++}
     ++
     ++static int dir_file_stats(struct object_directory *object_dir, void *data)
     ++{
     ++	struct strbuf *buf = data;
     ++
     ++	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
     ++
     ++	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
     ++				  data);
     ++
     ++	return 0;
     ++}
     ++
     ++static int count_files(char *path)
     ++{
     ++	DIR *dir = opendir(path);
     ++	struct dirent *e;
     ++	int count = 0;
     ++
     ++	if (!dir)
     ++		return 0;
     ++
     ++	while ((e = readdir(dir)) != NULL)
     ++		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
     ++			count++;
     ++
     ++	closedir(dir);
     ++	return count;
     ++}
     ++
     ++static void loose_objs_stats(struct strbuf *buf, const char *path)
     ++{
     ++	DIR *dir = opendir(path);
     ++	struct dirent *e;
     ++	int count;
     ++	int total = 0;
     ++	unsigned char c;
     ++	struct strbuf count_path = STRBUF_INIT;
     ++	size_t base_path_len;
     ++
     ++	if (!dir)
     ++		return;
     ++
     ++	strbuf_addstr(buf, "Object directory stats for ");
     ++	strbuf_add_absolute_path(buf, path);
     ++	strbuf_addstr(buf, ":\n");
     ++
     ++	strbuf_add_absolute_path(&count_path, path);
     ++	strbuf_addch(&count_path, '/');
     ++	base_path_len = count_path.len;
     ++
     ++	while ((e = readdir(dir)) != NULL)
     ++		if (!is_dot_or_dotdot(e->d_name) &&
     ++		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
     ++		    !hex_to_bytes(&c, e->d_name, 1)) {
     ++			strbuf_setlen(&count_path, base_path_len);
     ++			strbuf_addstr(&count_path, e->d_name);
     ++			total += (count = count_files(count_path.buf));
     ++			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
     ++		}
     ++
     ++	strbuf_addf(buf, "Total: %d loose objects", total);
     ++
     ++	strbuf_release(&count_path);
     ++	closedir(dir);
     ++}
     ++
     ++static int add_directory_to_archiver(struct strvec *archiver_args,
     ++				     const char *path, int recurse)
     ++{
     ++	int at_root = !*path;
     ++	DIR *dir;
     ++	struct dirent *e;
     ++	struct strbuf buf = STRBUF_INIT;
     ++	size_t len;
     ++	int res = 0;
     ++
     ++	dir = opendir(at_root ? "." : path);
     ++	if (!dir) {
     ++		if (errno == ENOENT) {
     ++			warning(_("could not archive missing directory '%s'"), path);
     ++			return 0;
     ++		}
     ++		return error_errno(_("could not open directory '%s'"), path);
     ++	}
     ++
     ++	if (!at_root)
     ++		strbuf_addf(&buf, "%s/", path);
     ++	len = buf.len;
     ++	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
     ++
     ++	while (!res && (e = readdir(dir))) {
     ++		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
     ++			continue;
     ++
     ++		strbuf_setlen(&buf, len);
     ++		strbuf_addstr(&buf, e->d_name);
     ++
     ++		if (e->d_type == DT_REG)
     ++			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
     ++		else if (e->d_type != DT_DIR)
     ++			warning(_("skipping '%s', which is neither file nor "
     ++				  "directory"), buf.buf);
     ++		else if (recurse &&
     ++			 add_directory_to_archiver(archiver_args,
     ++						   buf.buf, recurse) < 0)
     ++			res = -1;
     ++	}
     ++
     ++	closedir(dir);
     ++	strbuf_release(&buf);
     ++	return res;
     ++}
     ++
     ++int create_diagnostics_archive(struct strbuf *zip_path)
     ++{
     ++	struct strvec archiver_args = STRVEC_INIT;
     ++	char **argv_copy = NULL;
     ++	int stdout_fd = -1, archiver_fd = -1;
     ++	struct strbuf buf = STRBUF_INIT;
     ++	int res;
     ++
     ++	stdout_fd = dup(STDOUT_FILENO);
     ++	if (stdout_fd < 0) {
     ++		res = error_errno(_("could not duplicate stdout"));
     ++		goto diagnose_cleanup;
     ++	}
     ++
     ++	archiver_fd = xopen(zip_path->buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
     ++	if (dup2(archiver_fd, STDOUT_FILENO) < 0) {
     ++		res = error_errno(_("could not redirect output"));
     ++		goto diagnose_cleanup;
     ++	}
     ++
     ++	init_zip_archiver();
     ++	strvec_pushl(&archiver_args, "git-diagnose", "--format=zip", NULL);
     ++
     ++	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
     ++	get_version_info(&buf, 1);
     ++
     ++	strbuf_addf(&buf, "Repository root: %s\n", the_repository->worktree);
     ++	get_disk_info(&buf);
     ++	write_or_die(stdout_fd, buf.buf, buf.len);
     ++	strvec_pushf(&archiver_args,
     ++		     "--add-virtual-file=diagnostics.log:%.*s",
     ++		     (int)buf.len, buf.buf);
     ++
     ++	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
     ++	dir_file_stats(the_repository->objects->odb, &buf);
     ++	foreach_alt_odb(dir_file_stats, &buf);
     ++	strvec_push(&archiver_args, buf.buf);
     ++
     ++	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
     ++	loose_objs_stats(&buf, ".git/objects");
     ++	strvec_push(&archiver_args, buf.buf);
     ++
     ++	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
     ++		goto diagnose_cleanup;
     ++
     ++	strvec_pushl(&archiver_args, "--prefix=",
     ++		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
     ++
     ++	/* `write_archive()` modifies the `argv` passed to it. Let it. */
     ++	argv_copy = xmemdupz(archiver_args.v,
     ++			     sizeof(char *) * archiver_args.nr);
     ++	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
     ++			    the_repository, NULL, 0);
     ++	if (res) {
     ++		error(_("failed to write archive"));
     ++		goto diagnose_cleanup;
     ++	}
     ++
     ++	if (!res)
     ++		fprintf(stderr, "\n"
     ++			"Diagnostics complete.\n"
     ++			"All of the gathered info is captured in '%s'\n",
     ++			zip_path->buf);
     ++
     ++diagnose_cleanup:
     ++	if (archiver_fd >= 0) {
     ++		dup2(stdout_fd, STDOUT_FILENO);
     ++		close(stdout_fd);
     ++		close(archiver_fd);
     ++	}
     ++	free(argv_copy);
     ++	strvec_clear(&archiver_args);
     ++	strbuf_release(&buf);
     ++
     ++	return res;
     ++}
     +
     + ## diagnose.h (new) ##
     +@@
     ++#ifndef DIAGNOSE_H
     ++#define DIAGNOSE_H
     ++
     ++#include "cache.h"
     ++#include "strbuf.h"
     ++
     ++int create_diagnostics_archive(struct strbuf *zip_path);
     ++
     ++#endif /* DIAGNOSE_H */
  -:  ----------- >  6:  73e139ee377 builtin/diagnose.c: create 'git diagnose' builtin
  -:  ----------- >  7:  a3e62a4a041 builtin/diagnose.c: gate certain data behind '--all'
  2:  932dc8cddac !  8:  d81e7c10997 builtin/bugreport.c: create '--diagnose' option
     @@ Commit message
          Create a '--diagnose' option for 'git bugreport' to collect additional
          information about the repository and write it to a zipped archive.
      
     -    The "diagnose" functionality was originally implemented for Scalar in
     -    aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
     -    diagnostics gathered are not specific to Scalar-cloned repositories and
     -    could be useful when diagnosing issues in any Git repository.
     +    The '--diagnose' option behaves effectively as an alias for simultaneously
     +    running 'git bugreport' and 'git diagnose'. In the documentation, users are
     +    explicitly recommended to attach the diagnostics alongside a bug report to
     +    provide additional context to readers, ideally reducing some back-and-forth
     +    between reporters and those debugging the issue.
      
     -    Note that, while this patch appears large, it is mostly copied directly out
     -    of 'scalar.c'. Specifically, the functions
     -
     -    - dir_file_stats_objects()
     -    - dir_file_stats()
     -    - count_files()
     -    - loose_objs_stats()
     -    - add_directory_to_archiver()
     -    - get_disk_info()
     -
     -    are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()'
     -    function is a mostly unmodified copy of 'cmd_diagnose()', with the primary
     -    changes being that 'zip_path' is an input and "Enlistment root" is corrected
     -    to "Repository root" in the logs.
     -
     -    The remainder of the patch is made up of adding the '--diagnose' option to
     -    'cmd_bugreport()' (including generation of the archive's 'zip_path'),
     -    updating documentation, and adding a test. Note that the test is
     -    'test_expect_failure' due to bugs in the original 'scalar diagnose'. These
     -    will be fixed in subsequent patches.
     +    Note that '--diagnose' may take an optional string arg (either 'basic' or
     +    'all'). If specified without the arg or with 'basic', the behavior
     +    corresponds to running 'git diagnose' without '--all'; this default is meant
     +    to help reduce unintentional leaking of sensitive information). However, a
     +    user can still manually specify '--diagnose=all' to generate the equivalent
     +    archive to one created with 'git diagnose --all'.
      
          Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Documentation/git-bugreport.txt ##
     -@@ Documentation/git-bugreport.txt: git-bugreport - Collect information for user to file a bug report
     - SYNOPSIS
     +@@ Documentation/git-bugreport.txt: SYNOPSIS
       --------
       [verse]
     --'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
     -+'git bugreport' [<options>]
     + 'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
     ++		[--diagnose[=(basic|all)]]
       
       DESCRIPTION
       -----------
     @@ Documentation/git-bugreport.txt: The following information is captured automatic
        - $SHELL
       
      +Additional information may be gathered into a separate zip archive using the
     -+`--diagnose` option.
     ++`--diagnose` option, and can be attached alongside the bugreport document to
     ++provide additional context to readers.
      +
       This tool is invoked via the typical Git setup process, which means that in some
       cases, it might not be able to launch - for example, if a relevant config file
     @@ Documentation/git-bugreport.txt: OPTIONS
       	named 'git-bugreport-<formatted suffix>'. This should take the form of a
       	strftime(3) format string; the current local time will be used.
       
     -+--diagnose::
     ++--diagnose[=(basic|all)]::
      +	Create a zip archive of information about the repository including logs
      +	and certain statistics describing the data shape of the repository. The
      +	archive is written to the same output directory as the bug report and is
      +	named 'git-diagnostics-<formatted suffix>'.
     +++
     ++By default, `--diagnose` (equivalent to `--diagnose=basic`) will collect only
     ++statistics and summarized data about the repository and filesystem. Specifying
     ++`--diagnose=all` will create an archive with the same contents generated by `git
     ++diagnose --all`; this archive will be much larger, and will contain potentially
     ++sensitive information about the repository. See linkgit:git-diagnose[1] for more
     ++details on the contents of the diagnostic archive.
      +
       GIT
       ---
     @@ builtin/bugreport.c
       #include "compat/compiler.h"
       #include "hook.h"
       #include "hook-list.h"
     -+#include "dir.h"
     -+#include "object-store.h"
     -+#include "packfile.h"
     -+#include "archive.h"
     ++#include "diagnose.h"
       
     ++enum diagnose_mode {
     ++	DIAGNOSE_NONE,
     ++	DIAGNOSE_BASIC,
     ++	DIAGNOSE_ALL
     ++};
       
       static void get_system_info(struct strbuf *sys_info)
     -@@ builtin/bugreport.c: static void get_populated_hooks(struct strbuf *hook_info, int nongit)
     - }
     - 
     - static const char * const bugreport_usage[] = {
     --	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
     -+	N_("git bugreport [<options>]"),
     - 	NULL
     - };
     - 
     + {
      @@ builtin/bugreport.c: static void get_header(struct strbuf *buf, const char *title)
       	strbuf_addf(buf, "\n\n[%s]\n", title);
       }
       
     -+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
     -+				   const char *file_name, void *data)
     -+{
     -+	struct strbuf *buf = data;
     -+	struct stat st;
     -+
     -+	if (!stat(full_path, &st))
     -+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
     -+			    (uintmax_t)st.st_size);
     -+}
     -+
     -+static int dir_file_stats(struct object_directory *object_dir, void *data)
     -+{
     -+	struct strbuf *buf = data;
     -+
     -+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
     -+
     -+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
     -+				  data);
     -+
     -+	return 0;
     -+}
     -+
     -+static int count_files(char *path)
     -+{
     -+	DIR *dir = opendir(path);
     -+	struct dirent *e;
     -+	int count = 0;
     -+
     -+	if (!dir)
     -+		return 0;
     -+
     -+	while ((e = readdir(dir)) != NULL)
     -+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
     -+			count++;
     -+
     -+	closedir(dir);
     -+	return count;
     -+}
     -+
     -+static void loose_objs_stats(struct strbuf *buf, const char *path)
     -+{
     -+	DIR *dir = opendir(path);
     -+	struct dirent *e;
     -+	int count;
     -+	int total = 0;
     -+	unsigned char c;
     -+	struct strbuf count_path = STRBUF_INIT;
     -+	size_t base_path_len;
     -+
     -+	if (!dir)
     -+		return;
     -+
     -+	strbuf_addstr(buf, "Object directory stats for ");
     -+	strbuf_add_absolute_path(buf, path);
     -+	strbuf_addstr(buf, ":\n");
     -+
     -+	strbuf_add_absolute_path(&count_path, path);
     -+	strbuf_addch(&count_path, '/');
     -+	base_path_len = count_path.len;
     -+
     -+	while ((e = readdir(dir)) != NULL)
     -+		if (!is_dot_or_dotdot(e->d_name) &&
     -+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
     -+		    !hex_to_bytes(&c, e->d_name, 1)) {
     -+			strbuf_setlen(&count_path, base_path_len);
     -+			strbuf_addstr(&count_path, e->d_name);
     -+			total += (count = count_files(count_path.buf));
     -+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
     -+		}
     -+
     -+	strbuf_addf(buf, "Total: %d loose objects", total);
     -+
     -+	strbuf_release(&count_path);
     -+	closedir(dir);
     -+}
     -+
     -+static int add_directory_to_archiver(struct strvec *archiver_args,
     -+				     const char *path, int recurse)
     -+{
     -+	int at_root = !*path;
     -+	DIR *dir = opendir(at_root ? "." : path);
     -+	struct dirent *e;
     -+	struct strbuf buf = STRBUF_INIT;
     -+	size_t len;
     -+	int res = 0;
     -+
     -+	if (!dir)
     -+		return error_errno(_("could not open directory '%s'"), path);
     -+
     -+	if (!at_root)
     -+		strbuf_addf(&buf, "%s/", path);
     -+	len = buf.len;
     -+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
     -+
     -+	while (!res && (e = readdir(dir))) {
     -+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
     -+			continue;
     -+
     -+		strbuf_setlen(&buf, len);
     -+		strbuf_addstr(&buf, e->d_name);
     -+
     -+		if (e->d_type == DT_REG)
     -+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
     -+		else if (e->d_type != DT_DIR)
     -+			warning(_("skipping '%s', which is neither file nor "
     -+				  "directory"), buf.buf);
     -+		else if (recurse &&
     -+			 add_directory_to_archiver(archiver_args,
     -+						   buf.buf, recurse) < 0)
     -+			res = -1;
     -+	}
     -+
     -+	closedir(dir);
     -+	strbuf_release(&buf);
     -+	return res;
     -+}
     -+
     -+#ifndef WIN32
     -+#include <sys/statvfs.h>
     -+#endif
     -+
     -+static int get_disk_info(struct strbuf *out)
     ++static int option_parse_diagnose(const struct option *opt,
     ++				 const char *arg, int unset)
      +{
     -+#ifdef WIN32
     -+	struct strbuf buf = STRBUF_INIT;
     -+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
     -+	DWORD serial_number, component_length, flags;
     -+	ULARGE_INTEGER avail2caller, total, avail;
     -+
     -+	strbuf_realpath(&buf, ".", 1);
     -+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
     -+		error(_("could not determine free disk size for '%s'"),
     -+		      buf.buf);
     -+		strbuf_release(&buf);
     -+		return -1;
     -+	}
     ++	enum diagnose_mode *diagnose = opt->value;
      +
     -+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
     -+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
     -+				   &serial_number, &component_length, &flags,
     -+				   fs_name, sizeof(fs_name))) {
     -+		error(_("could not get info for '%s'"), buf.buf);
     -+		strbuf_release(&buf);
     -+		return -1;
     -+	}
     -+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
     -+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
     -+	strbuf_addch(out, '\n');
     -+	strbuf_release(&buf);
     -+#else
     -+	struct strbuf buf = STRBUF_INIT;
     -+	struct statvfs stat;
     ++	BUG_ON_OPT_NEG(unset);
      +
     -+	strbuf_realpath(&buf, ".", 1);
     -+	if (statvfs(buf.buf, &stat) < 0) {
     -+		error_errno(_("could not determine free disk size for '%s'"),
     -+			    buf.buf);
     -+		strbuf_release(&buf);
     -+		return -1;
     -+	}
     ++	if (!arg || !strcmp(arg, "basic"))
     ++		*diagnose = DIAGNOSE_BASIC;
     ++	else if (!strcmp(arg, "all"))
     ++		*diagnose = DIAGNOSE_ALL;
     ++	else
     ++		die(_("diagnose mode must be either 'basic' or 'all'"));
      +
     -+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
     -+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
     -+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
     -+	strbuf_release(&buf);
     -+#endif
      +	return 0;
      +}
     -+
     -+static int create_diagnostics_archive(struct strbuf *zip_path)
     -+{
     -+	struct strvec archiver_args = STRVEC_INIT;
     -+	char **argv_copy = NULL;
     -+	int stdout_fd = -1, archiver_fd = -1;
     -+	struct strbuf buf = STRBUF_INIT;
     -+	int res = 0;
     -+
     -+	stdout_fd = dup(1);
     -+	if (stdout_fd < 0) {
     -+		res = error_errno(_("could not duplicate stdout"));
     -+		goto diagnose_cleanup;
     -+	}
     -+
     -+	archiver_fd = xopen(zip_path->buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
     -+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
     -+		res = error_errno(_("could not redirect output"));
     -+		goto diagnose_cleanup;
     -+	}
     -+
     -+	init_zip_archiver();
     -+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
     -+
     -+	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
     -+	get_version_info(&buf, 1);
     -+
     -+	strbuf_addf(&buf, "Repository root: %s\n", the_repository->worktree);
     -+	get_disk_info(&buf);
     -+	write_or_die(stdout_fd, buf.buf, buf.len);
     -+	strvec_pushf(&archiver_args,
     -+		     "--add-virtual-file=diagnostics.log:%.*s",
     -+		     (int)buf.len, buf.buf);
     -+
     -+	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
     -+	dir_file_stats(the_repository->objects->odb, &buf);
     -+	foreach_alt_odb(dir_file_stats, &buf);
     -+	strvec_push(&archiver_args, buf.buf);
     -+
     -+	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
     -+	loose_objs_stats(&buf, ".git/objects");
     -+	strvec_push(&archiver_args, buf.buf);
     -+
     -+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     -+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     -+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     -+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     -+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
     -+		goto diagnose_cleanup;
     -+
     -+	strvec_pushl(&archiver_args, "--prefix=",
     -+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
     -+
     -+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
     -+	argv_copy = xmemdupz(archiver_args.v,
     -+			     sizeof(char *) * archiver_args.nr);
     -+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
     -+			    the_repository, NULL, 0);
     -+	if (res) {
     -+		error(_("failed to write archive"));
     -+		goto diagnose_cleanup;
     -+	}
     -+
     -+	if (!res)
     -+		fprintf(stderr, "\n"
     -+			"Diagnostics complete.\n"
     -+			"All of the gathered info is captured in '%s'\n",
     -+			zip_path->buf);
     -+
     -+diagnose_cleanup:
     -+	if (archiver_fd >= 0) {
     -+		close(1);
     -+		dup2(stdout_fd, 1);
     -+	}
     -+	free(argv_copy);
     -+	strvec_clear(&archiver_args);
     -+	strbuf_release(&buf);
     -+
     -+	return res;
     -+}
      +
       int cmd_bugreport(int argc, const char **argv, const char *prefix)
       {
     @@ builtin/bugreport.c: int cmd_bugreport(int argc, const char **argv, const char *
       	int report = -1;
       	time_t now = time(NULL);
       	struct tm tm;
     -+	int diagnose = 0;
     ++	enum diagnose_mode diagnose = DIAGNOSE_NONE;
       	char *option_output = NULL;
       	char *option_suffix = "%Y-%m-%d-%H%M";
       	const char *user_relative_path = NULL;
     @@ builtin/bugreport.c: int cmd_bugreport(int argc, const char **argv, const char *
      +	size_t output_path_len;
       
       	const struct option bugreport_options[] = {
     -+		OPT_BOOL(0, "diagnose", &diagnose,
     -+			 N_("generate a diagnostics zip archive")),
     ++		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("(basic|all)"),
     ++			       N_("create an additional zip archive of detailed diagnostics"),
     ++			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG, option_parse_diagnose),
       		OPT_STRING('o', "output-directory", &option_output, N_("path"),
      -			   N_("specify a destination for the bugreport file")),
      +			   N_("specify a destination for the bugreport file(s)")),
     @@ builtin/bugreport.c: int cmd_bugreport(int argc, const char **argv, const char *
       	}
       
      +	/* Prepare diagnostics, if requested */
     -+	if (diagnose) {
     ++	if (diagnose != DIAGNOSE_NONE) {
      +		struct strbuf zip_path = STRBUF_INIT;
      +		strbuf_add(&zip_path, report_path.buf, output_path_len);
      +		strbuf_addstr(&zip_path, "git-diagnostics-");
      +		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
      +		strbuf_addstr(&zip_path, ".zip");
      +
     -+		if (create_diagnostics_archive(&zip_path))
     ++		if (create_diagnostics_archive(&zip_path, diagnose == DIAGNOSE_ALL))
      +			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
      +
      +		strbuf_release(&zip_path);
     @@ t/t0091-bugreport.sh: test_expect_success 'indicates populated hooks' '
       	test_cmp expect actual
       '
       
     -+test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
     ++test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
      +	test_when_finished rm -rf report &&
      +
      +	git bugreport --diagnose -o report -s test >out &&
     @@ t/t0091-bugreport.sh: test_expect_success 'indicates populated hooks' '
      +	grep ".git/objects" out &&
      +
      +	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
     -+	grep "^Total: [0-9][0-9]*" out
     ++	grep "^Total: [0-9][0-9]*" out &&
     ++
     ++	# Should not include .git directory contents by default
     ++	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
     ++'
     ++
     ++test_expect_success UNZIP '--diagnose=basic excludes .git dir contents' '
     ++	test_when_finished rm -rf report &&
     ++
     ++	git bugreport --diagnose=basic -o report -s test >out &&
     ++
     ++	# Should not include .git directory contents
     ++	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
     ++'
     ++
     ++test_expect_success UNZIP '--diagnose=all includes .git dir contents' '
     ++	test_when_finished rm -rf report &&
     ++
     ++	git bugreport --diagnose=all -o report -s test >out &&
     ++
     ++	# Should include .git directory contents
     ++	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
     ++
     ++	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
     ++	test_file_not_empty out
      +'
      +
       test_done
  5:  d6527049a4f <  -:  ----------- builtin/bugreport.c: add '--no-report' option
  -:  ----------- >  9:  6834bdcaea8 scalar-diagnose: use 'git diagnose --all'
  7:  86d40a4bd15 ! 10:  14925c3feed scalar: update technical doc roadmap
     @@ Commit message
          scalar: update technical doc roadmap
      
          Update the Scalar roadmap to reflect the completion of generalizing 'scalar
     -    diagnose' into 'git bugreport --diagnose'.
     +    diagnose' into 'git diagnose' and 'git bugreport --diagnose'.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ Documentation/technical/scalar.txt: series have been accepted:
       - `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
       
      +- `scalar-generalize-diagnose`: Move the functionality of `scalar diagnose`
     -+  into `git bugreport --diagnose`.
     ++  into `git diagnose` and `git bugreport --diagnose`.
      +
       Roughly speaking (and subject to change), the following series are needed to
       "finish" this initial version of Scalar:

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 01/10] scalar-diagnose: use "$GIT_UNZIP" in test
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 02/10] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
tests on systems where 'unzip' is not in the system path.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/t/t9099-scalar.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 10b1172a8aa..fac86a57550 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -109,14 +109,14 @@ test_expect_success UNZIP 'scalar diagnose' '
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-	unzip -v "$zip_path" &&
+	"$GIT_UNZIP" -v "$zip_path" &&
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
-	unzip -p "$zip_path" diagnostics.log >out &&
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
-	unzip -p "$zip_path" packs-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
 	grep "$(pwd)/.git/objects" out &&
-	unzip -p "$zip_path" objects-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
 	grep "^Total: [1-9]" out
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 02/10] scalar-diagnose: avoid 32-bit overflow of size_t
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 01/10] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Avoid 32-bit size_t overflow when reporting the available disk space in
'get_disk_info' by casting the block size and available block count to
'off_t' before multiplying them. Without this change, 'st_mult' would
(correctly) report a size_t overflow on 32-bit systems at or exceeding 2^32
bytes of available space.

Note that 'off_t' is a 64-bit integer even on 32-bit systems due to the
inclusion of '#define _FILE_OFFSET_BITS 64' in 'git-compat-util.h' (see
b97e911643 (Support for large files on 32bit systems., 2007-02-17)).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 97e71fe19cd..04046452284 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -348,7 +348,7 @@ static int get_disk_info(struct strbuf *out)
 	}
 
 	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
 	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
 	strbuf_release(&buf);
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 01/10] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 02/10] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  6:19     ` Ævar Arnfjörð Bjarmason
  2022-08-04  1:45   ` [PATCH v2 04/10] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

If a directory added to the 'scalar diagnose' archiver does not exist, warn
and return 0 from 'add_directory_to_archiver()' rather than failing with a
fatal error. This handles a failure edge case where the '.git/logs' has not
yet been created when running 'scalar diagnose', but extends to any
situation where a directory may be missing in the '.git' dir.

Now, when a directory is missing a warning is captured in the diagnostic
logs. This provides a user with more complete information than if 'scalar
diagnose' simply failed with an error.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 04046452284..b9092f0b612 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 					  const char *path, int recurse)
 {
 	int at_root = !*path;
-	DIR *dir = opendir(at_root ? "." : path);
+	DIR *dir;
 	struct dirent *e;
 	struct strbuf buf = STRBUF_INIT;
 	size_t len;
 	int res = 0;
 
-	if (!dir)
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
 		return error_errno(_("could not open directory '%s'"), path);
+	}
 
 	if (!at_root)
 		strbuf_addf(&buf, "%s/", path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 04/10] scalar-diagnose: move 'get_disk_info()' to 'compat/'
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  1:45   ` [PATCH v2 05/10] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Move 'get_disk_info()' function into 'compat/'. Although Scalar-specific
code is generally not part of the main Git tree, 'get_disk_info()' will be
used in subsequent patches by additional callers beyond 'scalar diagnose'.
This patch prepares for that change, at which point this platform-specific
code should be part of 'compat/' as a matter of convention.

The function is copied *mostly* verbatim, with two exceptions:

* '#ifdef WIN32' is replaced with '#ifdef GIT_WINDOWS_NATIVE' to allow
  'statvfs' to be used with Cygwin.
* the 'struct strbuf buf' and 'int res' (as well as their corresponding
  cleanup & return) are moved outside of the '#ifdef' block.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 compat/disk.h           | 56 +++++++++++++++++++++++++++++++++++++++++
 contrib/scalar/scalar.c | 53 +-------------------------------------
 git-compat-util.h       |  1 +
 3 files changed, 58 insertions(+), 52 deletions(-)
 create mode 100644 compat/disk.h

diff --git a/compat/disk.h b/compat/disk.h
new file mode 100644
index 00000000000..50a32e3d8a4
--- /dev/null
+++ b/compat/disk.h
@@ -0,0 +1,56 @@
+#ifndef COMPAT_DISK_H
+#define COMPAT_DISK_H
+
+#include "git-compat-util.h"
+
+static int get_disk_info(struct strbuf *out)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int res = 0;
+
+#ifdef GIT_WINDOWS_NATIVE
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+#else
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+#endif
+
+cleanup:
+	strbuf_release(&buf);
+	return res;
+}
+
+#endif /* COMPAT_DISK_H */
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index b9092f0b612..607fedefd82 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -13,6 +13,7 @@
 #include "help.h"
 #include "archive.h"
 #include "object-store.h"
+#include "compat/disk.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -309,58 +310,6 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
-#ifndef WIN32
-#include <sys/statvfs.h>
-#endif
-
-static int get_disk_info(struct strbuf *out)
-{
-#ifdef WIN32
-	struct strbuf buf = STRBUF_INIT;
-	char volume_name[MAX_PATH], fs_name[MAX_PATH];
-	DWORD serial_number, component_length, flags;
-	ULARGE_INTEGER avail2caller, total, avail;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
-		error(_("could not determine free disk size for '%s'"),
-		      buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_setlen(&buf, offset_1st_component(buf.buf));
-	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
-				   &serial_number, &component_length, &flags,
-				   fs_name, sizeof(fs_name))) {
-		error(_("could not get info for '%s'"), buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, avail2caller.QuadPart);
-	strbuf_addch(out, '\n');
-	strbuf_release(&buf);
-#else
-	struct strbuf buf = STRBUF_INIT;
-	struct statvfs stat;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (statvfs(buf.buf, &stat) < 0) {
-		error_errno(_("could not determine free disk size for '%s'"),
-			    buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
-	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
-	strbuf_release(&buf);
-#endif
-	return 0;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
diff --git a/git-compat-util.h b/git-compat-util.h
index 58d7708296b..9a62e3a0d2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -258,6 +258,7 @@ static inline int is_xplatform_dir_sep(int c)
 #include <sys/resource.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/statvfs.h>
 #include <termios.h>
 #ifndef NO_SYS_SELECT_H
 #include <sys/select.h>
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 05/10] scalar-diagnose: move functionality to common location
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 04/10] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  6:24     ` Ævar Arnfjörð Bjarmason
  2022-08-04  1:45   ` [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Move the core functionality of 'scalar diagnose' into a new 'diagnose.[c,h]'
library to prepare for new callers in the main Git tree generating
diagnostic archives. These callers will be introduced in subsequent patches.

While this patch appears large, it is mostly made up of moving code out of
'scalar.c' and into 'diagnose.c'. Specifically, the functions

- dir_file_stats_objects()
- dir_file_stats()
- count_files()
- loose_objs_stats()
- add_directory_to_archiver()

are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()'
function is a mostly identical (partial) copy of 'cmd_diagnose()', with the
primary changes being that 'zip_path' is an input and "Enlistment root" is
corrected to "Repository root" in the archiver log.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                |   1 +
 contrib/scalar/scalar.c | 202 +------------------------------------
 diagnose.c              | 216 ++++++++++++++++++++++++++++++++++++++++
 diagnose.h              |   9 ++
 4 files changed, 228 insertions(+), 200 deletions(-)
 create mode 100644 diagnose.c
 create mode 100644 diagnose.h

diff --git a/Makefile b/Makefile
index 1624471badc..ad27a0bd70c 100644
--- a/Makefile
+++ b/Makefile
@@ -932,6 +932,7 @@ LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += delta-islands.o
+LIB_OBJS += diagnose.o
 LIB_OBJS += diff-delta.o
 LIB_OBJS += diff-merges.o
 LIB_OBJS += diff-lib.o
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 607fedefd82..3983def760a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,9 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "archive.h"
-#include "object-store.h"
-#include "compat/disk.h"
+#include "diagnose.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -263,53 +261,6 @@ static int unregister_dir(void)
 	return res;
 }
 
-static int add_directory_to_archiver(struct strvec *archiver_args,
-					  const char *path, int recurse)
-{
-	int at_root = !*path;
-	DIR *dir;
-	struct dirent *e;
-	struct strbuf buf = STRBUF_INIT;
-	size_t len;
-	int res = 0;
-
-	dir = opendir(at_root ? "." : path);
-	if (!dir) {
-		if (errno == ENOENT) {
-			warning(_("could not archive missing directory '%s'"), path);
-			return 0;
-		}
-		return error_errno(_("could not open directory '%s'"), path);
-	}
-
-	if (!at_root)
-		strbuf_addf(&buf, "%s/", path);
-	len = buf.len;
-	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
-
-	while (!res && (e = readdir(dir))) {
-		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
-			continue;
-
-		strbuf_setlen(&buf, len);
-		strbuf_addstr(&buf, e->d_name);
-
-		if (e->d_type == DT_REG)
-			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
-		else if (e->d_type != DT_DIR)
-			warning(_("skipping '%s', which is neither file nor "
-				  "directory"), buf.buf);
-		else if (recurse &&
-			 add_directory_to_archiver(archiver_args,
-						   buf.buf, recurse) < 0)
-			res = -1;
-	}
-
-	closedir(dir);
-	strbuf_release(&buf);
-	return res;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -550,83 +501,6 @@ cleanup:
 	return res;
 }
 
-static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
-				   const char *file_name, void *data)
-{
-	struct strbuf *buf = data;
-	struct stat st;
-
-	if (!stat(full_path, &st))
-		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
-			    (uintmax_t)st.st_size);
-}
-
-static int dir_file_stats(struct object_directory *object_dir, void *data)
-{
-	struct strbuf *buf = data;
-
-	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
-
-	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
-				  data);
-
-	return 0;
-}
-
-static int count_files(char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count = 0;
-
-	if (!dir)
-		return 0;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
-			count++;
-
-	closedir(dir);
-	return count;
-}
-
-static void loose_objs_stats(struct strbuf *buf, const char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count;
-	int total = 0;
-	unsigned char c;
-	struct strbuf count_path = STRBUF_INIT;
-	size_t base_path_len;
-
-	if (!dir)
-		return;
-
-	strbuf_addstr(buf, "Object directory stats for ");
-	strbuf_add_absolute_path(buf, path);
-	strbuf_addstr(buf, ":\n");
-
-	strbuf_add_absolute_path(&count_path, path);
-	strbuf_addch(&count_path, '/');
-	base_path_len = count_path.len;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) &&
-		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
-		    !hex_to_bytes(&c, e->d_name, 1)) {
-			strbuf_setlen(&count_path, base_path_len);
-			strbuf_addstr(&count_path, e->d_name);
-			total += (count = count_files(count_path.buf));
-			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
-		}
-
-	strbuf_addf(buf, "Total: %d loose objects", total);
-
-	strbuf_release(&count_path);
-	closedir(dir);
-}
-
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -637,12 +511,8 @@ static int cmd_diagnose(int argc, const char **argv)
 		NULL
 	};
 	struct strbuf zip_path = STRBUF_INIT;
-	struct strvec archiver_args = STRVEC_INIT;
-	char **argv_copy = NULL;
-	int stdout_fd = -1, archiver_fd = -1;
 	time_t now = time(NULL);
 	struct tm tm;
-	struct strbuf buf = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
@@ -663,79 +533,11 @@ static int cmd_diagnose(int argc, const char **argv)
 			    zip_path.buf);
 		goto diagnose_cleanup;
 	}
-	stdout_fd = dup(1);
-	if (stdout_fd < 0) {
-		res = error_errno(_("could not duplicate stdout"));
-		goto diagnose_cleanup;
-	}
-
-	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
-	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
-		res = error_errno(_("could not redirect output"));
-		goto diagnose_cleanup;
-	}
-
-	init_zip_archiver();
-	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
-	get_version_info(&buf, 1);
-
-	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
-	get_disk_info(&buf);
-	write_or_die(stdout_fd, buf.buf, buf.len);
-	strvec_pushf(&archiver_args,
-		     "--add-virtual-file=diagnostics.log:%.*s",
-		     (int)buf.len, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
-	dir_file_stats(the_repository->objects->odb, &buf);
-	foreach_alt_odb(dir_file_stats, &buf);
-	strvec_push(&archiver_args, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
-	loose_objs_stats(&buf, ".git/objects");
-	strvec_push(&archiver_args, buf.buf);
-
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
-		goto diagnose_cleanup;
-
-	strvec_pushl(&archiver_args, "--prefix=",
-		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
-
-	/* `write_archive()` modifies the `argv` passed to it. Let it. */
-	argv_copy = xmemdupz(archiver_args.v,
-			     sizeof(char *) * archiver_args.nr);
-	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
-			    the_repository, NULL, 0);
-	if (res) {
-		error(_("failed to write archive"));
-		goto diagnose_cleanup;
-	}
 
-	if (!res)
-		fprintf(stderr, "\n"
-		       "Diagnostics complete.\n"
-		       "All of the gathered info is captured in '%s'\n",
-		       zip_path.buf);
+	res = create_diagnostics_archive(&zip_path);
 
 diagnose_cleanup:
-	if (archiver_fd >= 0) {
-		close(1);
-		dup2(stdout_fd, 1);
-	}
-	free(argv_copy);
-	strvec_clear(&archiver_args);
 	strbuf_release(&zip_path);
-	strbuf_release(&buf);
-
 	return res;
 }
 
diff --git a/diagnose.c b/diagnose.c
new file mode 100644
index 00000000000..6c3774afb19
--- /dev/null
+++ b/diagnose.c
@@ -0,0 +1,216 @@
+#include "diagnose.h"
+#include "compat/disk.h"
+#include "archive.h"
+#include "dir.h"
+#include "help.h"
+#include "strvec.h"
+#include "object-store.h"
+#include "packfile.h"
+
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
+static int add_directory_to_archiver(struct strvec *archiver_args,
+				     const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir;
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
+		return error_errno(_("could not open directory '%s'"), path);
+	}
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
+int create_diagnostics_archive(struct strbuf *zip_path)
+{
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	struct strbuf buf = STRBUF_INIT;
+	int res;
+
+	stdout_fd = dup(STDOUT_FILENO);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path->buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (dup2(archiver_fd, STDOUT_FILENO) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "git-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Repository root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+			"Diagnostics complete.\n"
+			"All of the gathered info is captured in '%s'\n",
+			zip_path->buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		dup2(stdout_fd, STDOUT_FILENO);
+		close(stdout_fd);
+		close(archiver_fd);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&buf);
+
+	return res;
+}
diff --git a/diagnose.h b/diagnose.h
new file mode 100644
index 00000000000..e86e8a3c962
--- /dev/null
+++ b/diagnose.h
@@ -0,0 +1,9 @@
+#ifndef DIAGNOSE_H
+#define DIAGNOSE_H
+
+#include "cache.h"
+#include "strbuf.h"
+
+int create_diagnostics_archive(struct strbuf *zip_path);
+
+#endif /* DIAGNOSE_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 05/10] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  6:27     ` Ævar Arnfjörð Bjarmason
  2022-08-05 19:11     ` Derrick Stolee
  2022-08-04  1:45   ` [PATCH v2 07/10] builtin/diagnose.c: gate certain data behind '--all' Victoria Dye via GitGitGadget
                     ` (5 subsequent siblings)
  11 siblings, 2 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a 'git diagnose' builtin to generate a standalone zip archive of
repository diagnostics.

The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
can be useful when diagnosing issues in any Git repository.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 .gitignore                     |  1 +
 Documentation/git-diagnose.txt | 52 ++++++++++++++++++++++++++++++
 Makefile                       |  1 +
 builtin.h                      |  1 +
 builtin/diagnose.c             | 58 ++++++++++++++++++++++++++++++++++
 git.c                          |  1 +
 t/t0092-diagnose.sh            | 28 ++++++++++++++++
 7 files changed, 142 insertions(+)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100755 t/t0092-diagnose.sh

diff --git a/.gitignore b/.gitignore
index 42fd7253b44..80b530bbed2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -53,6 +53,7 @@
 /git-cvsimport
 /git-cvsserver
 /git-daemon
+/git-diagnose
 /git-diff
 /git-diff-files
 /git-diff-index
diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
new file mode 100644
index 00000000000..b12ef98f013
--- /dev/null
+++ b/Documentation/git-diagnose.txt
@@ -0,0 +1,52 @@
+git-diagnose(1)
+================
+
+NAME
+----
+git-diagnose - Generate a zip archive of diagnostic information
+
+SYNOPSIS
+--------
+[verse]
+'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+
+DESCRIPTION
+-----------
+Collects detailed information about the user's machine, Git client, and
+repository state and packages that information into a zip archive. The
+generated archive can then, for example, be shared with the Git mailing list to
+help debug an issue or serve as a reference for independent debugging.
+
+The following information is captured in the archive:
+
+  * 'git version --build-options'
+  * The path to the repository root
+  * The available disk space on the filesystem
+  * The name and size of each packfile, including those in alternate object
+    stores
+  * The total count of loose objects, as well as counts broken down by
+    `.git/objects` subdirectory
+  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
+    `.git/objects/info` directories
+
+This tool differs from linkgit:git-bugreport[1] in that it collects much more
+detailed information with a greater focus on reporting the size and data shape
+of repository contents.
+
+OPTIONS
+-------
+-o <path>::
+--output-directory <path>::
+	Place the resulting diagnostics archive in `<path>` instead of the
+	current directory.
+
+-s <format>::
+--suffix <format>::
+	Specify an alternate suffix for the diagnostics archive name, to create
+	a file named 'git-diagnostics-<formatted suffix>'. This should take the
+	form of a strftime(3) format string; the current local time will be
+	used.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index ad27a0bd70c..9ceaf55582a 100644
--- a/Makefile
+++ b/Makefile
@@ -1154,6 +1154,7 @@ BUILTIN_OBJS += builtin/credential-cache.o
 BUILTIN_OBJS += builtin/credential-store.o
 BUILTIN_OBJS += builtin/credential.o
 BUILTIN_OBJS += builtin/describe.o
+BUILTIN_OBJS += builtin/diagnose.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
 BUILTIN_OBJS += builtin/diff-tree.o
diff --git a/builtin.h b/builtin.h
index 40e9ecc8485..8901a34d6bf 100644
--- a/builtin.h
+++ b/builtin.h
@@ -144,6 +144,7 @@ int cmd_credential_cache(int argc, const char **argv, const char *prefix);
 int cmd_credential_cache_daemon(int argc, const char **argv, const char *prefix);
 int cmd_credential_store(int argc, const char **argv, const char *prefix);
 int cmd_describe(int argc, const char **argv, const char *prefix);
+int cmd_diagnose(int argc, const char **argv, const char *prefix);
 int cmd_diff_files(int argc, const char **argv, const char *prefix);
 int cmd_diff_index(int argc, const char **argv, const char *prefix);
 int cmd_diff(int argc, const char **argv, const char *prefix);
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
new file mode 100644
index 00000000000..c545c6bae1d
--- /dev/null
+++ b/builtin/diagnose.c
@@ -0,0 +1,58 @@
+#include "builtin.h"
+#include "parse-options.h"
+#include "diagnose.h"
+
+
+static const char * const diagnose_usage[] = {
+	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
+	NULL
+};
+
+int cmd_diagnose(int argc, const char **argv, const char *prefix)
+{
+	struct strbuf zip_path = STRBUF_INIT;
+	time_t now = time(NULL);
+	struct tm tm;
+	char *option_output = NULL;
+	char *option_suffix = "%Y-%m-%d-%H%M";
+	char *prefixed_filename;
+
+	const struct option diagnose_options[] = {
+		OPT_STRING('o', "output-directory", &option_output, N_("path"),
+			   N_("specify a destination for the diagnostics archive")),
+		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
+			   N_("specify a strftime format suffix for the filename")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, diagnose_options,
+			     diagnose_usage, 0);
+
+	/* Prepare the path to put the result */
+	prefixed_filename = prefix_filename(prefix,
+					    option_output ? option_output : "");
+	strbuf_addstr(&zip_path, prefixed_filename);
+	strbuf_complete(&zip_path, '/');
+
+	strbuf_addstr(&zip_path, "git-diagnostics-");
+	strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_OK:
+	case SCLD_EXISTS:
+		break;
+	default:
+		die(_("could not create leading directories for '%s'"),
+		    zip_path.buf);
+	}
+
+	/* Prepare diagnostics */
+	if (create_diagnostics_archive(&zip_path))
+		die_errno(_("unable to create diagnostics archive %s"),
+			  zip_path.buf);
+
+	free(prefixed_filename);
+	strbuf_release(&zip_path);
+	return 0;
+}
diff --git a/git.c b/git.c
index e5d62fa5a92..0b9d8ef7677 100644
--- a/git.c
+++ b/git.c
@@ -522,6 +522,7 @@ static struct cmd_struct commands[] = {
 	{ "credential-cache--daemon", cmd_credential_cache_daemon },
 	{ "credential-store", cmd_credential_store },
 	{ "describe", cmd_describe, RUN_SETUP },
+	{ "diagnose", cmd_diagnose, RUN_SETUP_GENTLY },
 	{ "diff", cmd_diff, NO_PARSEOPT },
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
new file mode 100755
index 00000000000..fa05bf6046f
--- /dev/null
+++ b/t/t0092-diagnose.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='git diagnose'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success UNZIP 'creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test >out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	grep "Available space" out &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 07/10] builtin/diagnose.c: gate certain data behind '--all'
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  6:39     ` Ævar Arnfjörð Bjarmason
  2022-08-04  1:45   ` [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
                     ` (4 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'git diagnose' to *not* include '.git/' directory contents by
default, instead requiring specification of a '--all' option to include it.
While helpful for debugging, the archived '.git/' directory contents may be
sensitive, as they can be used to reconstruct an entire repository.

To guard against users inadvertently including this information in
diagnostics and sharing it (e.g., with the mailing list), '.git/' directory
contents will only be included if '--all' is specified.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-diagnose.txt | 12 ++++++++++--
 builtin/diagnose.c             |  8 ++++++--
 contrib/scalar/scalar.c        |  2 +-
 diagnose.c                     | 14 ++++++++------
 diagnose.h                     |  2 +-
 t/t0092-diagnose.sh            | 17 ++++++++++++++++-
 6 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
index b12ef98f013..374b7402511 100644
--- a/Documentation/git-diagnose.txt
+++ b/Documentation/git-diagnose.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+	       [-a | --all]
 
 DESCRIPTION
 -----------
@@ -26,8 +27,6 @@ The following information is captured in the archive:
     stores
   * The total count of loose objects, as well as counts broken down by
     `.git/objects` subdirectory
-  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
-    `.git/objects/info` directories
 
 This tool differs from linkgit:git-bugreport[1] in that it collects much more
 detailed information with a greater focus on reporting the size and data shape
@@ -47,6 +46,15 @@ OPTIONS
 	form of a strftime(3) format string; the current local time will be
 	used.
 
+-a::
+--all::
+	Include more complete repository diagnostic information in the archive.
+	Specifically, this will add copies of `.git`, `.git/hooks`, `.git/info`,
+	`.git/logs`, and `.git/objects/info` directories to the output archive.
+	This additional data may be sensitive; a user can reconstruct the full
+	contents of the diagnosed repository with this information. Users should
+	exercise caution when sharing an archive generated with this option.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
index c545c6bae1d..0a2a63fdfbc 100644
--- a/builtin/diagnose.c
+++ b/builtin/diagnose.c
@@ -4,7 +4,7 @@
 
 
 static const char * const diagnose_usage[] = {
-	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
+	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>] [-a|--all]"),
 	NULL
 };
 
@@ -13,6 +13,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 	struct strbuf zip_path = STRBUF_INIT;
 	time_t now = time(NULL);
 	struct tm tm;
+	int include_everything = 0;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	char *prefixed_filename;
@@ -22,6 +23,9 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 			   N_("specify a destination for the diagnostics archive")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
 			   N_("specify a strftime format suffix for the filename")),
+		OPT_BOOL_F('a', "all", &include_everything,
+			   N_("collect complete diagnostic information"),
+			   PARSE_OPT_NONEG),
 		OPT_END()
 	};
 
@@ -48,7 +52,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 	}
 
 	/* Prepare diagnostics */
-	if (create_diagnostics_archive(&zip_path))
+	if (create_diagnostics_archive(&zip_path, include_everything))
 		die_errno(_("unable to create diagnostics archive %s"),
 			  zip_path.buf);
 
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 3983def760a..b10955531ce 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -534,7 +534,7 @@ static int cmd_diagnose(int argc, const char **argv)
 		goto diagnose_cleanup;
 	}
 
-	res = create_diagnostics_archive(&zip_path);
+	res = create_diagnostics_archive(&zip_path, 1);
 
 diagnose_cleanup:
 	strbuf_release(&zip_path);
diff --git a/diagnose.c b/diagnose.c
index 6c3774afb19..6be53d7a1f8 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -131,7 +131,7 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
-int create_diagnostics_archive(struct strbuf *zip_path)
+int create_diagnostics_archive(struct strbuf *zip_path, int include_everything)
 {
 	struct strvec archiver_args = STRVEC_INIT;
 	char **argv_copy = NULL;
@@ -176,11 +176,13 @@ int create_diagnostics_archive(struct strbuf *zip_path)
 	loose_objs_stats(&buf, ".git/objects");
 	strvec_push(&archiver_args, buf.buf);
 
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+	/* Only include this if explicitly requested */
+	if (include_everything &&
+	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
 		goto diagnose_cleanup;
 
 	strvec_pushl(&archiver_args, "--prefix=",
diff --git a/diagnose.h b/diagnose.h
index e86e8a3c962..c0c5daf65e7 100644
--- a/diagnose.h
+++ b/diagnose.h
@@ -4,6 +4,6 @@
 #include "cache.h"
 #include "strbuf.h"
 
-int create_diagnostics_archive(struct strbuf *zip_path);
+int create_diagnostics_archive(struct strbuf *zip_path, int include_everything);
 
 #endif /* DIAGNOSE_H */
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
index fa05bf6046f..3c808698d3f 100755
--- a/t/t0092-diagnose.sh
+++ b/t/t0092-diagnose.sh
@@ -22,7 +22,22 @@ test_expect_success UNZIP 'creates diagnostics zip archive' '
 	grep ".git/objects" out &&
 
 	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
-	grep "^Total: [0-9][0-9]*" out
+	grep "^Total: [0-9][0-9]*" out &&
+
+	# Should not include .git directory contents
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--all includes .git data in archive' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test --all >out &&
+
+	# Should include .git directory contents
+	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
+
+	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
+	test_file_not_empty out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 07/10] builtin/diagnose.c: gate certain data behind '--all' Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-05 19:35     ` Derrick Stolee
  2022-08-04  1:45   ` [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all' Victoria Dye via GitGitGadget
                     ` (3 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a '--diagnose' option for 'git bugreport' to collect additional
information about the repository and write it to a zipped archive.

The '--diagnose' option behaves effectively as an alias for simultaneously
running 'git bugreport' and 'git diagnose'. In the documentation, users are
explicitly recommended to attach the diagnostics alongside a bug report to
provide additional context to readers, ideally reducing some back-and-forth
between reporters and those debugging the issue.

Note that '--diagnose' may take an optional string arg (either 'basic' or
'all'). If specified without the arg or with 'basic', the behavior
corresponds to running 'git diagnose' without '--all'; this default is meant
to help reduce unintentional leaking of sensitive information). However, a
user can still manually specify '--diagnose=all' to generate the equivalent
archive to one created with 'git diagnose --all'.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-bugreport.txt | 18 +++++++++++++
 builtin/bugreport.c             | 47 +++++++++++++++++++++++++++++++--
 t/t0091-bugreport.sh            | 44 ++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
index d8817bf3cec..a4d984a77be 100644
--- a/Documentation/git-bugreport.txt
+++ b/Documentation/git-bugreport.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+		[--diagnose[=(basic|all)]]
 
 DESCRIPTION
 -----------
@@ -31,6 +32,10 @@ The following information is captured automatically:
  - A list of enabled hooks
  - $SHELL
 
+Additional information may be gathered into a separate zip archive using the
+`--diagnose` option, and can be attached alongside the bugreport document to
+provide additional context to readers.
+
 This tool is invoked via the typical Git setup process, which means that in some
 cases, it might not be able to launch - for example, if a relevant config file
 is unreadable. In this kind of scenario, it may be helpful to manually gather
@@ -49,6 +54,19 @@ OPTIONS
 	named 'git-bugreport-<formatted suffix>'. This should take the form of a
 	strftime(3) format string; the current local time will be used.
 
+--diagnose[=(basic|all)]::
+	Create a zip archive of information about the repository including logs
+	and certain statistics describing the data shape of the repository. The
+	archive is written to the same output directory as the bug report and is
+	named 'git-diagnostics-<formatted suffix>'.
++
+By default, `--diagnose` (equivalent to `--diagnose=basic`) will collect only
+statistics and summarized data about the repository and filesystem. Specifying
+`--diagnose=all` will create an archive with the same contents generated by `git
+diagnose --all`; this archive will be much larger, and will contain potentially
+sensitive information about the repository. See linkgit:git-diagnose[1] for more
+details on the contents of the diagnostic archive.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 9de32bc96e7..1907258d61d 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -5,7 +5,13 @@
 #include "compat/compiler.h"
 #include "hook.h"
 #include "hook-list.h"
+#include "diagnose.h"
 
+enum diagnose_mode {
+	DIAGNOSE_NONE,
+	DIAGNOSE_BASIC,
+	DIAGNOSE_ALL
+};
 
 static void get_system_info(struct strbuf *sys_info)
 {
@@ -91,6 +97,23 @@ static void get_header(struct strbuf *buf, const char *title)
 	strbuf_addf(buf, "\n\n[%s]\n", title);
 }
 
+static int option_parse_diagnose(const struct option *opt,
+				 const char *arg, int unset)
+{
+	enum diagnose_mode *diagnose = opt->value;
+
+	BUG_ON_OPT_NEG(unset);
+
+	if (!arg || !strcmp(arg, "basic"))
+		*diagnose = DIAGNOSE_BASIC;
+	else if (!strcmp(arg, "all"))
+		*diagnose = DIAGNOSE_ALL;
+	else
+		die(_("diagnose mode must be either 'basic' or 'all'"));
+
+	return 0;
+}
+
 int cmd_bugreport(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf buffer = STRBUF_INIT;
@@ -98,16 +121,21 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	int report = -1;
 	time_t now = time(NULL);
 	struct tm tm;
+	enum diagnose_mode diagnose = DIAGNOSE_NONE;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	const char *user_relative_path = NULL;
 	char *prefixed_filename;
+	size_t output_path_len;
 
 	const struct option bugreport_options[] = {
+		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("(basic|all)"),
+			       N_("create an additional zip archive of detailed diagnostics"),
+			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG, option_parse_diagnose),
 		OPT_STRING('o', "output-directory", &option_output, N_("path"),
-			   N_("specify a destination for the bugreport file")),
+			   N_("specify a destination for the bugreport file(s)")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
-			   N_("specify a strftime format suffix for the filename")),
+			   N_("specify a strftime format suffix for the filename(s)")),
 		OPT_END()
 	};
 
@@ -119,6 +147,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 					    option_output ? option_output : "");
 	strbuf_addstr(&report_path, prefixed_filename);
 	strbuf_complete(&report_path, '/');
+	output_path_len = report_path.len;
 
 	strbuf_addstr(&report_path, "git-bugreport-");
 	strbuf_addftime(&report_path, option_suffix, localtime_r(&now, &tm), 0, 0);
@@ -133,6 +162,20 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 		    report_path.buf);
 	}
 
+	/* Prepare diagnostics, if requested */
+	if (diagnose != DIAGNOSE_NONE) {
+		struct strbuf zip_path = STRBUF_INIT;
+		strbuf_add(&zip_path, report_path.buf, output_path_len);
+		strbuf_addstr(&zip_path, "git-diagnostics-");
+		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+		strbuf_addstr(&zip_path, ".zip");
+
+		if (create_diagnostics_archive(&zip_path, diagnose == DIAGNOSE_ALL))
+			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
+
+		strbuf_release(&zip_path);
+	}
+
 	/* Prepare the report contents */
 	get_bug_template(&buffer);
 
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
index 08f5fe9caef..1a63b3803f3 100755
--- a/t/t0091-bugreport.sh
+++ b/t/t0091-bugreport.sh
@@ -78,4 +78,48 @@ test_expect_success 'indicates populated hooks' '
 	test_cmp expect actual
 '
 
+test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose -o report -s test >out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	grep "Available space" out &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out &&
+
+	# Should not include .git directory contents by default
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--diagnose=basic excludes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose=basic -o report -s test >out &&
+
+	# Should not include .git directory contents
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--diagnose=all includes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose=all -o report -s test >out &&
+
+	# Should include .git directory contents
+	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
+
+	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all'
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (7 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04  6:54     ` Ævar Arnfjörð Bjarmason
  2022-08-04  1:45   ` [PATCH v2 10/10] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
                     ` (2 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Replace implementation of 'scalar diagnose' with an internal invocation of
'git diagnose --all'. This simplifies the implementation of 'cmd_diagnose'
by making it a direct alias of 'git diagnose' and removes some code in
'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the
alias also sets up a clean deprecation path for 'scalar diagnose' (in favor
of 'git diagnose'), if that is desired in the future.

This introduces one minor change to the output of 'scalar diagnose', which
is that the prefix of the created zip archive is changed from 'scalar_' to
'git-diagnostics-'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index b10955531ce..fe2a0e9decb 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,7 +11,6 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "diagnose.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -510,34 +509,20 @@ static int cmd_diagnose(int argc, const char **argv)
 		N_("scalar diagnose [<enlistment>]"),
 		NULL
 	};
-	struct strbuf zip_path = STRBUF_INIT;
-	time_t now = time(NULL);
-	struct tm tm;
+	struct strbuf diagnostics_root = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
 			     usage, 0);
 
-	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
-
-	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
-	strbuf_addftime(&zip_path,
-			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
-	strbuf_addstr(&zip_path, ".zip");
-	switch (safe_create_leading_directories(zip_path.buf)) {
-	case SCLD_EXISTS:
-	case SCLD_OK:
-		break;
-	default:
-		error_errno(_("could not create directory for '%s'"),
-			    zip_path.buf);
-		goto diagnose_cleanup;
-	}
+	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
+	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
 
-	res = create_diagnostics_archive(&zip_path, 1);
+	if (run_git("diagnose", "--all", "-s", "%Y%m%d_%H%M%S",
+		    "-o", diagnostics_root.buf, NULL) < 0)
+		res = -1;
 
-diagnose_cleanup:
-	strbuf_release(&zip_path);
+	strbuf_release(&diagnostics_root);
 	return res;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 10/10] scalar: update technical doc roadmap
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (8 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all' Victoria Dye via GitGitGadget
@ 2022-08-04  1:45   ` Victoria Dye via GitGitGadget
  2022-08-04 17:22   ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Junio C Hamano
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04  1:45 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update the Scalar roadmap to reflect the completion of generalizing 'scalar
diagnose' into 'git diagnose' and 'git bugreport --diagnose'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/technical/scalar.txt | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
index 08bc09c225a..f6353375f08 100644
--- a/Documentation/technical/scalar.txt
+++ b/Documentation/technical/scalar.txt
@@ -84,6 +84,9 @@ series have been accepted:
 
 - `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
 
+- `scalar-generalize-diagnose`: Move the functionality of `scalar diagnose`
+  into `git diagnose` and `git bugreport --diagnose`.
+
 Roughly speaking (and subject to change), the following series are needed to
 "finish" this initial version of Scalar:
 
@@ -91,12 +94,6 @@ Roughly speaking (and subject to change), the following series are needed to
   and implement `scalar help`. At the end of this series, Scalar should be
   feature-complete from the perspective of a user.
 
-- Generalize features not specific to Scalar: In the spirit of making Scalar
-  configure only what is needed for large repo performance, move common
-  utilities into other parts of Git. Some of this will be internal-only, but one
-  major change will be generalizing `scalar diagnose` for use with any Git
-  repository.
-
 - Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
   `git`, including updates to build and install it with the rest of Git. This
   change will incorporate Scalar into the Git CI and test framework, as well as
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently
  2022-08-04  1:45   ` [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
@ 2022-08-04  6:19     ` Ævar Arnfjörð Bjarmason
  2022-08-04 17:12       ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04  6:19 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> If a directory added to the 'scalar diagnose' archiver does not exist, warn
> and return 0 from 'add_directory_to_archiver()' rather than failing with a
> fatal error. This handles a failure edge case where the '.git/logs' has not
> yet been created when running 'scalar diagnose', but extends to any
> situation where a directory may be missing in the '.git' dir.
>
> Now, when a directory is missing a warning is captured in the diagnostic
> logs. This provides a user with more complete information than if 'scalar
> diagnose' simply failed with an error.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  contrib/scalar/scalar.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 04046452284..b9092f0b612 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
>  					  const char *path, int recurse)
>  {
>  	int at_root = !*path;
> -	DIR *dir = opendir(at_root ? "." : path);
> +	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t len;
>  	int res = 0;
>  
> -	if (!dir)
> +	dir = opendir(at_root ? "." : path);
> +	if (!dir) {
> +		if (errno == ENOENT) {

Per [1] I think this is incorrect or overly strict. Let's not spew
warnings if the user "rm -rf .git/hooks" or whatever.

It might be valuable to note in some file in the archive such oddities
we find, but warning() won't give us that.

> +			warning(_("could not archive missing directory '%s'"), path);

In any case, this should be e.g.:

	warning_errno(_("could not archive directory '%s'"), path);

You already have an errno, so using *_errno() will add the standard
information about what the issue is.

> +			return 0;
> +		}
>  		return error_errno(_("could not open directory '%s'"), path);
> +	}
>  
>  	if (!at_root)
>  		strbuf_addf(&buf, "%s/", path);

1. https://lore.kernel.org/git/220610.86ilp9s1x7.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 05/10] scalar-diagnose: move functionality to common location
  2022-08-04  1:45   ` [PATCH v2 05/10] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
@ 2022-08-04  6:24     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04  6:24 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]
> diff --git a/diagnose.h b/diagnose.h
> new file mode 100644
> index 00000000000..e86e8a3c962
> --- /dev/null
> +++ b/diagnose.h
> @@ -0,0 +1,9 @@
> +#ifndef DIAGNOSE_H
> +#define DIAGNOSE_H
> +
> +#include "cache.h"

We don't need cache.h here, just ...

> +#include "strbuf.h"

...this, also a matter of preference, but we could also just skip the
includes here and use a forward decl (as is common in other headers):

	struct strbuf;

> +
> +int create_diagnostics_archive(struct strbuf *zip_path);
> +
> +#endif /* DIAGNOSE_H */


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-04  1:45   ` [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
@ 2022-08-04  6:27     ` Ævar Arnfjörð Bjarmason
  2022-08-05 19:38       ` Derrick Stolee
  2022-08-05 19:11     ` Derrick Stolee
  1 sibling, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04  6:27 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Create a 'git diagnose' builtin to generate a standalone zip archive of
> repository diagnostics.

It's good to have this as a built-in separate from "git bugreport",
but...

> +git-diagnose - Generate a zip archive of diagnostic information

...I'd really prefer for this not to squat on such a common name we
might regret having reserved later for such very specific
functionality. I'd think e.g. these would be better:

	git mk-diagnostics-zip

Or maybe:

	git archive-interesting-for-report

If I had to guess what a "git diagnose" did, I'd probably think:

 * It analyzes your config, and suggests redundancies/alternatives
 * It does some perf tests / heuritics, and e.g. suggests you turn on
   the commit-graph writing.

etc., this (arguably even too generic then) made sense as "scalar
diagnose" because scalar is all about being an opinionated interface
targeted at performance", so there's an implied "my repo's performance"
following a "scalar diagnose".

But as a top-level command-name I think we should pick something more
specific to what it does, which is (I haven't fully read ahead in the
re-roll, but I'm assuming is) mostly/entirely to be a "helper" for
"scalar diagnose" and/or "git bugreport".

> +	switch (safe_create_leading_directories(zip_path.buf)) {
> +	case SCLD_OK:
> +	case SCLD_EXISTS:
> +		break;
> +	default:
> +		die(_("could not create leading directories for '%s'"),
> +		    zip_path.buf);

This seems to be carrying forward a minor bug from bugreport.c we should
probably fix: we should use die_errno() here (and maybe lead with a
commit to fix bugreport.c's version).

The strbuf*() before that also seems like a good candidate for a utility
function in your new diagnose library, i.e. to have both bugreport.c and
diagnose.c pass it the prefix/suffix/format, then try to create that
directory, then replace the copy/pasting here with a one-line call to
that now-shared function.

The two codepaths only seem to differ in the prefix & suffix from a
quick skim, the rest is all copy/pasted...


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 07/10] builtin/diagnose.c: gate certain data behind '--all'
  2022-08-04  1:45   ` [PATCH v2 07/10] builtin/diagnose.c: gate certain data behind '--all' Victoria Dye via GitGitGadget
@ 2022-08-04  6:39     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04  6:39 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]
>  --------
>  [verse]
>  'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
> +	       [-a | --all]

I have some local patches that...

>  static const char * const diagnose_usage[] = {
> -	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
> +	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>] [-a|--all]"),
>  	NULL
>  };

...spot when we have SYNOPSIS & -h discrepancies. In this case we break
with a \n after <format> in the SYNOPSIS, but you don't add a "\n" and
indentation here in the -h output. The two should be the same.

> @@ -13,6 +13,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
>  	struct strbuf zip_path = STRBUF_INIT;
>  	time_t now = time(NULL);
>  	struct tm tm;
> +	int include_everything = 0;
>  	char *option_output = NULL;
>  	char *option_suffix = "%Y-%m-%d-%H%M";
>  	char *prefixed_filename;
> @@ -22,6 +23,9 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
>  			   N_("specify a destination for the diagnostics archive")),
>  		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
>  			   N_("specify a strftime format suffix for the filename")),
> +		OPT_BOOL_F('a', "all", &include_everything,
> +			   N_("collect complete diagnostic information"),
> +			   PARSE_OPT_NONEG),

Nice to have a "stats only" by default and some way to add the whole
shebang optionally...

> +int create_diagnostics_archive(struct strbuf *zip_path, int include_everything)

...but maybe...
>  {
>  	struct strvec archiver_args = STRVEC_INIT;
>  	char **argv_copy = NULL;
> @@ -176,11 +176,13 @@ int create_diagnostics_archive(struct strbuf *zip_path)
>  	loose_objs_stats(&buf, ".git/objects");
>  	strvec_push(&archiver_args, buf.buf);
>  
> -	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +	/* Only include this if explicitly requested */
> +	if (include_everything &&
> +	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
>  		goto diagnose_cleanup;

...this should be --include-gitdir-extract or something, instead of
"--all" and "--include-everything".

I'd think that "all" would be a thing that would actually tar up my
entire .git directory as-is (in a way that would pass git fsck on the
other end (unless that's the bug being reported...)).

Aside: Since we are getting the churn of adding this, then re-indenting
 it maybe a prep step of adding a add_directories_to_archiver() would be
 useful, which would just have a data-driven:

	{
		{ ".git" },
		[...],
		{ ".git/logs, 1 },
		NULL
	},

Then loop over that, making it easy to add/declare new subdirs to add.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all'
  2022-08-04  1:45   ` [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all' Victoria Dye via GitGitGadget
@ 2022-08-04  6:54     ` Ævar Arnfjörð Bjarmason
  2022-08-09 16:54       ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04  6:54 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Replace implementation of 'scalar diagnose' with an internal invocation of
> 'git diagnose --all'. This simplifies the implementation of 'cmd_diagnose'
> by making it a direct alias of 'git diagnose' and removes some code in
> 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the
> alias also sets up a clean deprecation path for 'scalar diagnose' (in favor
> of 'git diagnose'), if that is desired in the future.
>
> This introduces one minor change to the output of 'scalar diagnose', which
> is that the prefix of the created zip archive is changed from 'scalar_' to
> 'git-diagnostics-'.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  contrib/scalar/scalar.c | 29 +++++++----------------------
>  1 file changed, 7 insertions(+), 22 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index b10955531ce..fe2a0e9decb 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -11,7 +11,6 @@
>  #include "dir.h"
>  #include "packfile.h"
>  #include "help.h"
> -#include "diagnose.h"
>  
>  /*
>   * Remove the deepest subdirectory in the provided path string. Path must not
> @@ -510,34 +509,20 @@ static int cmd_diagnose(int argc, const char **argv)
>  		N_("scalar diagnose [<enlistment>]"),
>  		NULL
>  	};
> -	struct strbuf zip_path = STRBUF_INIT;
> -	time_t now = time(NULL);
> -	struct tm tm;
> +	struct strbuf diagnostics_root = STRBUF_INIT;
>  	int res = 0;
>  
>  	argc = parse_options(argc, argv, NULL, options,
>  			     usage, 0);
>  
> -	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
> -
> -	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
> -	strbuf_addftime(&zip_path,
> -			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> -	strbuf_addstr(&zip_path, ".zip");
> -	switch (safe_create_leading_directories(zip_path.buf)) {
> -	case SCLD_EXISTS:
> -	case SCLD_OK:
> -		break;
> -	default:
> -		error_errno(_("could not create directory for '%s'"),
> -			    zip_path.buf);
> -		goto diagnose_cleanup;

Just spotting this now, but we had ad error, but we "goto
diagnose_cleanup", but that will use our "res = 0" above.

Is this untested already or in this series (didn't go back to look). But
maybe a moot point, the post-image replacement uses die()..

> -	}
> +	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
> +	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
>  
> -	res = create_diagnostics_archive(&zip_path, 1);
> +	if (run_git("diagnose", "--all", "-s", "%Y%m%d_%H%M%S",
> +		    "-o", diagnostics_root.buf, NULL) < 0)
> +		res = -1;

The code handling here seems really odd, issues:

 * This *can* return -1, if start_command() fails, but that's by far the
   rarer case, usually it would be 0 or >0 (only <0 if we can't start
   the command at all).

 * You should not be returning -1 from cmd_*() in general (we have
   outstanding issues with it, but those should be fixed). It will yield
   an exit code of 255 (but it's not portable)).

 * If you're going to return -1 at all, why override <0 with -1, just
   "res = run_git(...)" instead?

I think all-in-all this should be:

	res = run_git(...);

Then:

>  
> -diagnose_cleanup:
> -	strbuf_release(&zip_path);
> +	strbuf_release(&diagnostics_root);
>  	return res;

	return res < 0 ? -res : res;

Or whatever.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently
  2022-08-04  6:19     ` Ævar Arnfjörð Bjarmason
@ 2022-08-04 17:12       ` Junio C Hamano
  2022-08-04 20:12         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-04 17:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	johannes.schindelin, Victoria Dye

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	dir = opendir(at_root ? "." : path);
>> +	if (!dir) {
>> +		if (errno == ENOENT) {
>
> Per [1]

"Per [1]" somehow sounds more like a reference to an authoritative
source, at least to me.  Every time you use it, I have to see what
it refers to, and after realizing that you used it as a replacement
of "I said it already in [1]" again, it leaves a funny feeling.

> I think this is incorrect or overly strict. Let's not spew
> warnings if the user "rm -rf .git/hooks" or whatever.

The above is doing the right thing even in that situation, doesn't
it?  If there is no ".git/hooks" that is fine.  We get ENOENT, give
a warning to indicate that we found an unusual situation, and return
without failing.  If we got something other than ENOENT, we fail with
error_errno(), because opendir() failed for a reason other than "No
such file or directory".

> You already have an errno, so using *_errno() will add the standard
> information about what the issue is.

Reading the code aloud, slowly, may help.  When errno says ENOENT,
we know opendir() failed because of "No such file or directory",
so "path" was missing.  So let's say 'not archiving a missing directory'".

ENOENT or "No such file or directory" is an implementation detail
that does not help the end user.

The other side, i.e. when the errno is *not* ENOENT, already uses
error_errno().

So, I am puzzled.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (9 preceding siblings ...)
  2022-08-04  1:45   ` [PATCH v2 10/10] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
@ 2022-08-04 17:22   ` Junio C Hamano
  2022-08-09 16:17     ` Victoria Dye
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
  11 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-04 17:22 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * (Almost) entirely redesigned the UI for generating diagnostics. The new
>    approach avoids cluttering 'git bugreport' with a mode that doesn't
>    actually generate a report. Now, there are distinct options for different
>    use cases: generating extra diagnostics with a bug report ('git bugreport
>    --diagnose') and generating diagnostics for personal debugging/addition
>    to an existing bug report ('git diagnose').

An additional command gives us far more design flexibility, and in
this case I think it may be worth it.  It has a risk of confusing
users between "git bugreport --diag" and "git diagnose --report" a
way to send a report with diagnostic information, though.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently
  2022-08-04 17:12       ` Junio C Hamano
@ 2022-08-04 20:12         ` Ævar Arnfjörð Bjarmason
  2022-08-04 21:09           ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04 20:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	johannes.schindelin, Victoria Dye


On Thu, Aug 04 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +	dir = opendir(at_root ? "." : path);
>>> +	if (!dir) {
>>> +		if (errno == ENOENT) {
>>
>> Per [1]
>
> "Per [1]" somehow sounds more like a reference to an authoritative
> source, at least to me.  Every time you use it, I have to see what
> it refers to, and after realizing that you used it as a replacement
> of "I said it already in [1]" again, it leaves a funny feeling.

"Per" in the sense of "Per what I noted in [1]".

>> I think this is incorrect or overly strict. Let's not spew
>> warnings if the user "rm -rf .git/hooks" or whatever.
>
> The above is doing the right thing even in that situation, doesn't
> it?  If there is no ".git/hooks" that is fine.  We get ENOENT, give
> a warning to indicate that we found an unusual situation, and return
> without failing.  If we got something other than ENOENT, we fail with
> error_errno(), because opendir() failed for a reason other than "No
> such file or directory".

I'm mainly noting that the point of this step is to produce an archive
for the consumption of the remote end.

Therefore it seems to me like it would me much more useful to note these
"oddities" in some log that we're about to zip up, rather than issue a
warning().

The "per [1]" was a reference to the (paraphrasing) "maybe that's not
needed at all", but you seemed to think so. But for the purposes of the
discussion here let's assume we keep it.

>> You already have an errno, so using *_errno() will add the standard
>> information about what the issue is.
>
> Reading the code aloud, slowly, may help.  When errno says ENOENT,
> we know opendir() failed because of "No such file or directory",
> so "path" was missing.  So let's say 'not archiving a missing directory'".
>
> ENOENT or "No such file or directory" is an implementation detail
> that does not help the end user.
>
> The other side, i.e. when the errno is *not* ENOENT, already uses
> error_errno().
>
> So, I am puzzled.

I'm pointing out that we don't need to include that part in the message,
because warning_errno() will already give us that for free. I.e.:

	warning: could not archive directory '<some dir>': No such file or directory

v.s.:

	warning: could not archive missing directory '<some dir>'

The advantages of doing so being:

 * It's clear (at least to the keen eye) that it's using the "errno
   format", so you know it's not just saying "could not for <whatever
   reason>", it specifically got ENOENT.

 * The i18n for the strerror() comes from the C library, which will be
   translated already, whereas a new git.pot message won't be (but we'll
   hopefully bridge the gap eventually).

 * This way we we can share the message, whatever the errno happens to
   be, so we could e.g.:

      errno = ENOENT;
      warning_errno(_("could not archive directory '%s'"), "<some dir>");
      errno = ENOMEM;
      error_errno(_("could not archive directory '%s'"), "<some dir>");

   Whereas putting the reason for why we couldn't (which just duplicates
   the errno) in the message forces the messages & i18n to diverge.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently
  2022-08-04 20:12         ` Ævar Arnfjörð Bjarmason
@ 2022-08-04 21:09           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-04 21:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	johannes.schindelin, Victoria Dye

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I'm mainly noting that the point of this step is to produce an archive
> for the consumption of the remote end.
>
> Therefore it seems to me like it would me much more useful to note these
> "oddities" in some log that we're about to zip up, rather than issue a
> warning().

Hmph, the receiving end that inspects the archive will know that a
directory did not get archived, but they cannot tell if that is
because it did not exist, or because it was unreadable, and the
trouble the user may be having can well be the result of having
the directory unreadable.  So from that point of view, in addition
to these warnings and errors, it would be helpful to record the
errors we encounter while we generate the diagnostic archive in the
archive itself for inspection.

But the warning on the local side has merit to warn the user of an
unusual situation.  "I am puzzled why the hook I thought I wrote did
not trigger, but the diag tool says I do not have .git/hooks at all"
is a welcome side effect, even though it may not be the primary
effect we are aiming to gain by having these warning messages.

> I'm pointing out that we don't need to include that part in the message,
> because warning_errno() will already give us that for free. I.e.:
>
> 	warning: could not archive directory '<some dir>': No such file or directory
>
> v.s.:
>
> 	warning: could not archive missing directory '<some dir>'
>
> The advantages of doing so being:
>
>  * It's clear (at least to the keen eye) that it's using the "errno
>    format", so you know it's not just saying "could not for <whatever
>    reason>", it specifically got ENOENT.

Funny.  I find it much clearer if we can use our own message without
having to rely on whatever strerror(error) gives us.  We know better
than the C library why we got ENOENT and be more readable.  They say
"No such file or directory" because from ENOENT alone they cannot
tell you if it was a file or a directory that was missing, but we
have a better context like "we were trying to create an archive" and
"we tried opendir, expecting that the thing is a directory".

>  * The i18n for the strerror() comes from the C library, which will be
>    translated already, whereas a new git.pot message won't be (but we'll
>    hopefully bridge the gap eventually).

As I said above, I do not think it is an advantage in this case that
strerror() is translated, as the point of having a separate message
is because we can be more to-the-point, and we do not need to use
the strerror() result in there.

>  * This way we we can share the message, whatever the errno happens to
>    be, so we could e.g.:
>
>       errno = ENOENT;
>       warning_errno(_("could not archive directory '%s'"), "<some dir>");
>       errno = ENOMEM;
>       error_errno(_("could not archive directory '%s'"), "<some dir>");
>
>    Whereas putting the reason for why we couldn't (which just duplicates
>    the errno) in the message forces the messages & i18n to diverge.

And an added clarity that we can use a separate message is something
I think is worth having, compared to the cost of having an extra
message over the more generic one for any errno.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-04  1:45   ` [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
  2022-08-04  6:27     ` Ævar Arnfjörð Bjarmason
@ 2022-08-05 19:11     ` Derrick Stolee
  1 sibling, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-08-05 19:11 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: johannes.schindelin, Ævar Arnfjörð Bjarmason,
	Victoria Dye

On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Create a 'git diagnose' builtin to generate a standalone zip archive of
> repository diagnostics.

> +  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
> +    `.git/objects/info` directories

You remove these lines in the next patch, which is called "gate certain data
behind '--all'" but maybe we shouldn't have this functionality now and
instead add it in the future.

The biggest reason for the --all option is that these contents will likely
include private IP (path names and branch names, but not file contents) that
the user would probably not want to share with the public mailing list, but
might want to share with a trusted Git expert in order to resolve a problem.
You mention earlier that

  The generated archive can then, for example, be shared with the
  Git mailing list to help debug an issue or serve as a reference for
  independent debugging.

So, if you're sending a v3, then moving this out of this patch and into the
next one would be a good way to be sure that this possibly-private data is
not mentioned as something to share super publicly.

(Of course, this requires making the change to create_diagnostics_archive()
in advance of creating the builtin, so maybe this reorganization isn't
worth it.)

> @@ -0,0 +1,58 @@
> +#include "builtin.h"
> +#include "parse-options.h"
> +#include "diagnose.h"
> +
> +

nit: double empty line

> +++ b/t/t0092-diagnose.sh
> @@ -0,0 +1,28 @@
> +#!/bin/sh
> +
> +test_description='git diagnose'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +test_expect_success UNZIP 'creates diagnostics zip archive' '
> +	test_when_finished rm -rf report &&
> +
> +	git diagnose -o report -s test >out &&
> +
> +	zip_path=report/git-diagnostics-test.zip &&
> +	grep "Available space" out &&
> +	test_path_is_file "$zip_path" &&

nit: 'grep' the output immediately after the 'git diagnose' command and
keep the zip_path use immediately after its definition.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option
  2022-08-04  1:45   ` [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-05 19:35     ` Derrick Stolee
  2022-08-09 23:53       ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-08-05 19:35 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: johannes.schindelin, Ævar Arnfjörð Bjarmason,
	Victoria Dye

On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>

> +--diagnose[=(basic|all)]::
> +	Create a zip archive of information about the repository including logs

logs? I think the reflogs are not included unless "all" is specified. Perhaps
we can unify this description with the beginning of git-diagnose.txt:

  Collects detailed information about the user's machine, Git client, and
  repository state and packages that information into a zip archive.

resulting in

	Create a zip archive containing information about the user's machine,
	Git client, and repository state.

> +	and certain statistics describing the data shape of the repository. The
> +	archive is written to the same output directory as the bug report and is
> +	named 'git-diagnostics-<formatted suffix>'.
> ++
> +By default, `--diagnose` (equivalent to `--diagnose=basic`) will collect only
> +statistics and summarized data about the repository and filesystem. Specifying
> +`--diagnose=all` will create an archive with the same contents generated by `git
> +diagnose --all`; this archive will be much larger, and will contain potentially
> +sensitive information about the repository. See linkgit:git-diagnose[1] for more
> +details on the contents of the diagnostic archive.

Perhaps here (and git-diagnose.txt) should be really explicit about sharing the
"all" mode output only with trusted parties. Let the user decide what level of
trust is necessary depending on their situation (we don't need to say "open source
repos are fine to share" or something).

> +enum diagnose_mode {
> +	DIAGNOSE_NONE,
> +	DIAGNOSE_BASIC,
> +	DIAGNOSE_ALL
> +};

This enum makes me think that it might be nice to use this in diagnose.h
along with an array that pairs strings with the enum. We could unify the
options by having 'git diagnose --mode=(basic|all)' which could be
extended in the future with another mode that might be in between the two.

It may also be a waste of time to set up that infrastructure without it
actually mattering in the future, but I thought I'd mention it as an
alternative, in case that inspires you.

>  static void get_system_info(struct strbuf *sys_info)
>  {
> @@ -91,6 +97,23 @@ static void get_header(struct strbuf *buf, const char *title)
>  	strbuf_addf(buf, "\n\n[%s]\n", title);
>  }
>  
> +static int option_parse_diagnose(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	enum diagnose_mode *diagnose = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +
> +	if (!arg || !strcmp(arg, "basic"))
> +		*diagnose = DIAGNOSE_BASIC;
> +	else if (!strcmp(arg, "all"))
> +		*diagnose = DIAGNOSE_ALL;

Should we allow "none" to reset the value to DIAGNOSE_NONE?

> +	else
> +		die(_("diagnose mode must be either 'basic' or 'all'"));

I wondered initially if this should be a usage() call instead. But we have
plenty of examples of using die() to report an issue with a single option
or a combination of options.

>  	const struct option bugreport_options[] = {
> +		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("(basic|all)"),
> +			       N_("create an additional zip archive of detailed diagnostics"),
> +			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG, option_parse_diagnose),

The biggest reason for this to be an OPT_CALLBACK_F is because of the
'--diagnose' option (without '='), so an OPT_STRING would not be
appropriate here.

> @@ -119,6 +147,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  					    option_output ? option_output : "");
>  	strbuf_addstr(&report_path, prefixed_filename);
>  	strbuf_complete(&report_path, '/');
> +	output_path_len = report_path.len;

Perhaps this should be renamed to output_dir_len, since we know this is
a directory that will contain all of the output files.

> +	/* Prepare diagnostics, if requested */
> +	if (diagnose != DIAGNOSE_NONE) {
> +		struct strbuf zip_path = STRBUF_INIT;
> +		strbuf_add(&zip_path, report_path.buf, output_path_len);
> +		strbuf_addstr(&zip_path, "git-diagnostics-");
> +		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
> +		strbuf_addstr(&zip_path, ".zip");
> +
> +		if (create_diagnostics_archive(&zip_path, diagnose == DIAGNOSE_ALL))

(Just pausing to say this could be create_diagnostics_archive(&zip_path, diagnose)
if we use the enum inside diagnose.c.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-04  6:27     ` Ævar Arnfjörð Bjarmason
@ 2022-08-05 19:38       ` Derrick Stolee
  2022-08-11 11:06         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-08-05 19:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, johannes.schindelin, Victoria Dye

On 8/4/2022 2:27 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Create a 'git diagnose' builtin to generate a standalone zip archive of
>> repository diagnostics.
> 
> It's good to have this as a built-in separate from "git bugreport",
> but...
> 
>> +git-diagnose - Generate a zip archive of diagnostic information
> 
> ...I'd really prefer for this not to squat on such a common name we
> might regret having reserved later for such very specific
> functionality. I'd think e.g. these would be better:
> 
> 	git mk-diagnostics-zip
> 
> Or maybe:
> 
> 	git archive-interesting-for-report

These are not realistic replacements.

> If I had to guess what a "git diagnose" did, I'd probably think:
> 
>  * It analyzes your config, and suggests redundancies/alternatives
>  * It does some perf tests / heuritics, and e.g. suggests you turn on
>    the commit-graph writing.

These sound like great options to add in the future, such as:

   --perf-test: Run performance tests on your repository using different
   Git config options and recommend certain settings.

(This --perf-test option would be a great way to get wider adoption
of parallel checkout, since its optimal settings are so machine
dependent.)

The thing is, even if we did these other things, it would result in
some kind of document that summarizes the repository shape and features.
That kind of data is exactly what this version of 'git diagnose' does.

For now, it leaves the human reader responsible for making decisions
based on those documents, but they have been incredibly helpful when we
are _diagnosing_ issues users are having with their repositories.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'
  2022-08-04 17:22   ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Junio C Hamano
@ 2022-08-09 16:17     ` Victoria Dye
  2022-08-09 16:50       ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-09 16:17 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  * (Almost) entirely redesigned the UI for generating diagnostics. The new
>>    approach avoids cluttering 'git bugreport' with a mode that doesn't
>>    actually generate a report. Now, there are distinct options for different
>>    use cases: generating extra diagnostics with a bug report ('git bugreport
>>    --diagnose') and generating diagnostics for personal debugging/addition
>>    to an existing bug report ('git diagnose').
> 
> An additional command gives us far more design flexibility, and in
> this case I think it may be worth it.  It has a risk of confusing
> users between "git bugreport --diag" and "git diagnose --report" a
> way to send a report with diagnostic information, though.

This is an interesting point and something I think users could plausibly run
into. I can think of a few of ways to address that:

1. Do nothing, wait for feedback from users.
2. Create hidden option(s) '--report' and/or '--bugreport' in 'git diagnose'
   that trigger a warning (or advice?) along the lines of "did you mean 'git
   bugreport --diagnose'?" and exit with 'usage()'.
3. Create visible options '--report' and/or '--bugreport' in 'git diagnose'
   that invoke 'git bugreport --diagnose'.

I'm leaning towards option 2, but I'd also understand not wanting to clutter
builtins with options it *doesn't* use for the sake of advising a user.

> 
> 


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'
  2022-08-09 16:17     ` Victoria Dye
@ 2022-08-09 16:50       ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-09 16:50 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	johannes.schindelin, Ævar Arnfjörð Bjarmason

Victoria Dye <vdye@github.com> writes:

> This is an interesting point and something I think users could plausibly run
> into. I can think of a few of ways to address that:
>
> 1. Do nothing, wait for feedback from users.
> 2. Create hidden option(s) '--report' and/or '--bugreport' in 'git diagnose'
>    that trigger a warning (or advice?) along the lines of "did you mean 'git
>    bugreport --diagnose'?" and exit with 'usage()'.
> 3. Create visible options '--report' and/or '--bugreport' in 'git diagnose'
>    that invoke 'git bugreport --diagnose'.
>
> I'm leaning towards option 2, but I'd also understand not wanting to clutter
> builtins with options it *doesn't* use for the sake of advising a user.

I was leaning towards option 1, actually ;-)

I'm offline today, so please expect no changes to the public
repositories.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all'
  2022-08-04  6:54     ` Ævar Arnfjörð Bjarmason
@ 2022-08-09 16:54       ` Victoria Dye
  0 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye @ 2022-08-09 16:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Replace implementation of 'scalar diagnose' with an internal invocation of
>> 'git diagnose --all'. This simplifies the implementation of 'cmd_diagnose'
>> by making it a direct alias of 'git diagnose' and removes some code in
>> 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the
>> alias also sets up a clean deprecation path for 'scalar diagnose' (in favor
>> of 'git diagnose'), if that is desired in the future.
>>
>> This introduces one minor change to the output of 'scalar diagnose', which
>> is that the prefix of the created zip archive is changed from 'scalar_' to
>> 'git-diagnostics-'.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  contrib/scalar/scalar.c | 29 +++++++----------------------
>>  1 file changed, 7 insertions(+), 22 deletions(-)
>>
>> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
>> index b10955531ce..fe2a0e9decb 100644
>> --- a/contrib/scalar/scalar.c
>> +++ b/contrib/scalar/scalar.c
>> @@ -11,7 +11,6 @@
>>  #include "dir.h"
>>  #include "packfile.h"
>>  #include "help.h"
>> -#include "diagnose.h"
>>  
>>  /*
>>   * Remove the deepest subdirectory in the provided path string. Path must not
>> @@ -510,34 +509,20 @@ static int cmd_diagnose(int argc, const char **argv)
>>  		N_("scalar diagnose [<enlistment>]"),
>>  		NULL
>>  	};
>> -	struct strbuf zip_path = STRBUF_INIT;
>> -	time_t now = time(NULL);
>> -	struct tm tm;
>> +	struct strbuf diagnostics_root = STRBUF_INIT;
>>  	int res = 0;
>>  
>>  	argc = parse_options(argc, argv, NULL, options,
>>  			     usage, 0);
>>  
>> -	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
>> -
>> -	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
>> -	strbuf_addftime(&zip_path,
>> -			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
>> -	strbuf_addstr(&zip_path, ".zip");
>> -	switch (safe_create_leading_directories(zip_path.buf)) {
>> -	case SCLD_EXISTS:
>> -	case SCLD_OK:
>> -		break;
>> -	default:
>> -		error_errno(_("could not create directory for '%s'"),
>> -			    zip_path.buf);
>> -		goto diagnose_cleanup;
> 
> Just spotting this now, but we had ad error, but we "goto
> diagnose_cleanup", but that will use our "res = 0" above.
> 
> Is this untested already or in this series (didn't go back to look). But
> maybe a moot point, the post-image replacement uses die()..

Nice catch - this does appear to be a pre-existing bug in 'scalar diagnose'.
Given that both 'git diagnose' and 'git bugreport --diagnose' handle this
case more appropriately, though, I agree that it's a bit of a moot point and
not worth the churn created by a bugfix patch.

> 
>> -	}
>> +	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
>> +	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
>>  
>> -	res = create_diagnostics_archive(&zip_path, 1);
>> +	if (run_git("diagnose", "--all", "-s", "%Y%m%d_%H%M%S",
>> +		    "-o", diagnostics_root.buf, NULL) < 0)
>> +		res = -1;
> 
> The code handling here seems really odd, issues:
> 
>  * This *can* return -1, if start_command() fails, but that's by far the
>    rarer case, usually it would be 0 or >0 (only <0 if we can't start
>    the command at all).
> 
>  * You should not be returning -1 from cmd_*() in general (we have
>    outstanding issues with it, but those should be fixed). It will yield
>    an exit code of 255 (but it's not portable)).
> 
>  * If you're going to return -1 at all, why override <0 with -1, just
>    "res = run_git(...)" instead?

Thanks for the info, I'll replace the hardcoded '-1' return value with
something derived from 'res' in the next version.

> 
> I think all-in-all this should be:
> 
> 	res = run_git(...);
> 
> Then:
> 
>>  
>> -diagnose_cleanup:
>> -	strbuf_release(&zip_path);
>> +	strbuf_release(&diagnostics_root);
>>  	return res;
> 
> 	return res < 0 ? -res : res;
> 
> Or whatever.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option
  2022-08-05 19:35     ` Derrick Stolee
@ 2022-08-09 23:53       ` Victoria Dye
  2022-08-10 12:52         ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-09 23:53 UTC (permalink / raw)
  To: Derrick Stolee, Victoria Dye via GitGitGadget, git
  Cc: johannes.schindelin, Ævar Arnfjörð Bjarmason

Derrick Stolee wrote:
> On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
> 
>> +--diagnose[=(basic|all)]::
>> +	Create a zip archive of information about the repository including logs
> 
> logs? I think the reflogs are not included unless "all" is specified. Perhaps
> we can unify this description with the beginning of git-diagnose.txt:
> 
>   Collects detailed information about the user's machine, Git client, and
>   repository state and packages that information into a zip archive.
> 
> resulting in
> 
> 	Create a zip archive containing information about the user's machine,
> 	Git client, and repository state.
> 

[...]

>> +	and certain statistics describing the data shape of the repository. The
>> +	archive is written to the same output directory as the bug report and is
>> +	named 'git-diagnostics-<formatted suffix>'.
>> ++
>> +By default, `--diagnose` (equivalent to `--diagnose=basic`) will collect only
>> +statistics and summarized data about the repository and filesystem. Specifying
>> +`--diagnose=all` will create an archive with the same contents generated by `git
>> +diagnose --all`; this archive will be much larger, and will contain potentially
>> +sensitive information about the repository. See linkgit:git-diagnose[1] for more
>> +details on the contents of the diagnostic archive.
> 
> Perhaps here (and git-diagnose.txt) should be really explicit about sharing the
> "all" mode output only with trusted parties. Let the user decide what level of
> trust is necessary depending on their situation (we don't need to say "open source
> repos are fine to share" or something).

Both of these documentation suggestions make sense to me; I'll update
accordingly in V3.

> 
>> +enum diagnose_mode {
>> +	DIAGNOSE_NONE,
>> +	DIAGNOSE_BASIC,
>> +	DIAGNOSE_ALL
>> +};
> 
> This enum makes me think that it might be nice to use this in diagnose.h
> along with an array that pairs strings with the enum. We could unify the
> options by having 'git diagnose --mode=(basic|all)' which could be
> extended in the future with another mode that might be in between the two.
> 
> It may also be a waste of time to set up that infrastructure without it
> actually mattering in the future, but I thought I'd mention it as an
> alternative, in case that inspires you.

Your suggestion is more extensible than the boolean "include_everything" I'm
using right now in 'create_diagnostics_archive()'. I'll incorporate it into
my next re-roll, thanks!

> 
>>  static void get_system_info(struct strbuf *sys_info)
>>  {
>> @@ -91,6 +97,23 @@ static void get_header(struct strbuf *buf, const char *title)
>>  	strbuf_addf(buf, "\n\n[%s]\n", title);
>>  }
>>  
>> +static int option_parse_diagnose(const struct option *opt,
>> +				 const char *arg, int unset)
>> +{
>> +	enum diagnose_mode *diagnose = opt->value;
>> +
>> +	BUG_ON_OPT_NEG(unset);
>> +
>> +	if (!arg || !strcmp(arg, "basic"))
>> +		*diagnose = DIAGNOSE_BASIC;
>> +	else if (!strcmp(arg, "all"))
>> +		*diagnose = DIAGNOSE_ALL;
> 
> Should we allow "none" to reset the value to DIAGNOSE_NONE?

As far as I can tell, while some builtins have options that  match the
default behavior of the command (e.g., '--no-autosquash' in 'git rebase'),
those options typically exist to override a config setting (e.g.,
'rebase.autosquash'). No config exists for 'bugreport --diagnose' (and I
don't think it would make sense to add one), so '--diagnose=none' would only
be used to override another '--diagnose' specification in the same
command/alias (e.g., 'git bugreport --diagnose=basic --diagnose=none'). 

That use case seems unlikely, but if there's precedent or use cases I'm not
accounting for, I'm happy to add the option.

> 
>> +	else
>> +		die(_("diagnose mode must be either 'basic' or 'all'"));
> 
> I wondered initially if this should be a usage() call instead. But we have
> plenty of examples of using die() to report an issue with a single option
> or a combination of options.

After looking at other 'OPT_CALLBACK_F' examples (e.g., the 'diff_opt_stat'
options in 'diff.c'), I think this should at least change to 'return
error(<something>)' to get a more appropriate return code (129):

	else
		return error(_("invalid --%s mode '%s'"), opt->long_name, arg);

I also looked into using 'usage_msg_opt()' to print both an error message
and the usage string, but doing so would require making the 'bugreport'
options structure & variables static so they're accessible by
'option_parse_diagnose()'. If you think it would be valuable to include that
information, I'll add an extra refactoring patch to include it.

> 
>>  	const struct option bugreport_options[] = {
>> +		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("(basic|all)"),
>> +			       N_("create an additional zip archive of detailed diagnostics"),
>> +			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG, option_parse_diagnose),
> 
> The biggest reason for this to be an OPT_CALLBACK_F is because of the
> '--diagnose' option (without '='), so an OPT_STRING would not be
> appropriate here.
> 
>> @@ -119,6 +147,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>>  					    option_output ? option_output : "");
>>  	strbuf_addstr(&report_path, prefixed_filename);
>>  	strbuf_complete(&report_path, '/');
>> +	output_path_len = report_path.len;
> 
> Perhaps this should be renamed to output_dir_len, since we know this is
> a directory that will contain all of the output files.

Agreed, 'path' is a bit ambiguous and 'dir' more clearly refers to the
directory containing the output files. 

> 
>> +	/* Prepare diagnostics, if requested */
>> +	if (diagnose != DIAGNOSE_NONE) {
>> +		struct strbuf zip_path = STRBUF_INIT;
>> +		strbuf_add(&zip_path, report_path.buf, output_path_len);
>> +		strbuf_addstr(&zip_path, "git-diagnostics-");
>> +		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
>> +		strbuf_addstr(&zip_path, ".zip");
>> +
>> +		if (create_diagnostics_archive(&zip_path, diagnose == DIAGNOSE_ALL))
> 
> (Just pausing to say this could be create_diagnostics_archive(&zip_path, diagnose)
> if we use the enum inside diagnose.c.
> 
> Thanks,
> -Stolee


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option
  2022-08-09 23:53       ` Victoria Dye
@ 2022-08-10 12:52         ` Derrick Stolee
  2022-08-10 16:13           ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-08-10 12:52 UTC (permalink / raw)
  To: Victoria Dye, Victoria Dye via GitGitGadget, git
  Cc: johannes.schindelin, Ævar Arnfjörð Bjarmason

On 8/9/22 7:53 PM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:

>>> +static int option_parse_diagnose(const struct option *opt,
>>> +				 const char *arg, int unset)
>>> +{
>>> +	enum diagnose_mode *diagnose = opt->value;
>>> +
>>> +	BUG_ON_OPT_NEG(unset);
>>> +
>>> +	if (!arg || !strcmp(arg, "basic"))
>>> +		*diagnose = DIAGNOSE_BASIC;
>>> +	else if (!strcmp(arg, "all"))
>>> +		*diagnose = DIAGNOSE_ALL;
>>
>> Should we allow "none" to reset the value to DIAGNOSE_NONE?
> 
> As far as I can tell, while some builtins have options that  match the
> default behavior of the command (e.g., '--no-autosquash' in 'git rebase'),
> those options typically exist to override a config setting (e.g.,
> 'rebase.autosquash'). No config exists for 'bugreport --diagnose' (and I
> don't think it would make sense to add one), so '--diagnose=none' would only
> be used to override another '--diagnose' specification in the same
> command/alias (e.g., 'git bugreport --diagnose=basic --diagnose=none'). 

Ah, so --diagnose=none isn't valuable because --no-diagnose would be
the better way to write the same thing. You would need to remove the
PARSE_OPT_NONEG from your OPT_CALLBACK_F() to allow that (and then do
the appropriate logic with the "unset" parameter).

The reason to have these things is basically so users can create
aliases (say 'git br' expands to 'git bugreport --diagnose=all', but
they want to run 'git br --no-diagnose' to clear that --diagnose=all).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option
  2022-08-10 12:52         ` Derrick Stolee
@ 2022-08-10 16:13           ` Victoria Dye
  2022-08-10 16:47             ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-10 16:13 UTC (permalink / raw)
  To: Derrick Stolee, Victoria Dye via GitGitGadget, git
  Cc: johannes.schindelin, Ævar Arnfjörð Bjarmason

Derrick Stolee wrote:
> On 8/9/22 7:53 PM, Victoria Dye wrote:
>> Derrick Stolee wrote:
>>> On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:
> 
>>>> +static int option_parse_diagnose(const struct option *opt,
>>>> +				 const char *arg, int unset)
>>>> +{
>>>> +	enum diagnose_mode *diagnose = opt->value;
>>>> +
>>>> +	BUG_ON_OPT_NEG(unset);
>>>> +
>>>> +	if (!arg || !strcmp(arg, "basic"))
>>>> +		*diagnose = DIAGNOSE_BASIC;
>>>> +	else if (!strcmp(arg, "all"))
>>>> +		*diagnose = DIAGNOSE_ALL;
>>>
>>> Should we allow "none" to reset the value to DIAGNOSE_NONE?
>>
>> As far as I can tell, while some builtins have options that  match the
>> default behavior of the command (e.g., '--no-autosquash' in 'git rebase'),
>> those options typically exist to override a config setting (e.g.,
>> 'rebase.autosquash'). No config exists for 'bugreport --diagnose' (and I
>> don't think it would make sense to add one), so '--diagnose=none' would only
>> be used to override another '--diagnose' specification in the same
>> command/alias (e.g., 'git bugreport --diagnose=basic --diagnose=none'). 
> 
> Ah, so --diagnose=none isn't valuable because --no-diagnose would be
> the better way to write the same thing. You would need to remove the
> PARSE_OPT_NONEG from your OPT_CALLBACK_F() to allow that (and then do
> the appropriate logic with the "unset" parameter).

I'm not sure I follow. I wasn't suggesting a difference in value between
'--no-diagnose' and '--diagnose=none'. My point was that, when there's an
option variant that "resets" the value to the default (like
'--no-autosquash', '--no-recurse-submodules', etc.), it usually *also*
corresponds to an overridable config setting ('rebase.autosquash',
'push.recurseSubmodules'). No such 'bugreport.diagnose' config exists (or,
IMO, should exist), so the need for a "reset to default" option seemed
weaker. 

I used boolean options as my examples, but they aren't intended to imply a
meaningful difference between '--no-diagnose' amd '--diagnose=none'.

> 
> The reason to have these things is basically so users can create
> aliases (say 'git br' expands to 'git bugreport --diagnose=all', but
> they want to run 'git br --no-diagnose' to clear that --diagnose=all).

I considered that usage ("'--diagnose=none' would only be used to override
another '--diagnose' specification in the same command/alias"), but wasn't
sure how common it would be for this particular option. It sounds like you
can see it being useful, so I'll include '--diagnose=none' in the next
version.

> 
> Thanks,
> -Stolee


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option
  2022-08-10 16:13           ` Victoria Dye
@ 2022-08-10 16:47             ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-08-10 16:47 UTC (permalink / raw)
  To: Victoria Dye, Victoria Dye via GitGitGadget, git
  Cc: johannes.schindelin, Ævar Arnfjörð Bjarmason

On 8/10/2022 12:13 PM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 8/9/22 7:53 PM, Victoria Dye wrote:
>>> Derrick Stolee wrote:
>>>> On 8/3/2022 9:45 PM, Victoria Dye via GitGitGadget wrote:
>>
>>>>> +static int option_parse_diagnose(const struct option *opt,
>>>>> +				 const char *arg, int unset)
>>>>> +{
>>>>> +	enum diagnose_mode *diagnose = opt->value;
>>>>> +
>>>>> +	BUG_ON_OPT_NEG(unset);
>>>>> +
>>>>> +	if (!arg || !strcmp(arg, "basic"))
>>>>> +		*diagnose = DIAGNOSE_BASIC;
>>>>> +	else if (!strcmp(arg, "all"))
>>>>> +		*diagnose = DIAGNOSE_ALL;
>>>>
>>>> Should we allow "none" to reset the value to DIAGNOSE_NONE?
>>>
>>> As far as I can tell, while some builtins have options that  match the
>>> default behavior of the command (e.g., '--no-autosquash' in 'git rebase'),
>>> those options typically exist to override a config setting (e.g.,
>>> 'rebase.autosquash'). No config exists for 'bugreport --diagnose' (and I
>>> don't think it would make sense to add one), so '--diagnose=none' would only
>>> be used to override another '--diagnose' specification in the same
>>> command/alias (e.g., 'git bugreport --diagnose=basic --diagnose=none'). 
>>
>> Ah, so --diagnose=none isn't valuable because --no-diagnose would be
>> the better way to write the same thing. You would need to remove the
>> PARSE_OPT_NONEG from your OPT_CALLBACK_F() to allow that (and then do
>> the appropriate logic with the "unset" parameter).
> 
> I'm not sure I follow. I wasn't suggesting a difference in value between
> '--no-diagnose' and '--diagnose=none'. My point was that, when there's an
> option variant that "resets" the value to the default (like
> '--no-autosquash', '--no-recurse-submodules', etc.), it usually *also*
> corresponds to an overridable config setting ('rebase.autosquash',
> 'push.recurseSubmodules'). No such 'bugreport.diagnose' config exists (or,
> IMO, should exist), so the need for a "reset to default" option seemed
> weaker. 

You're right that these make the most sense when there can be a
non-CLI source of a setting to unset, which is a better reason than
the alias reason I gave.
 
> I used boolean options as my examples, but they aren't intended to imply a
> meaningful difference between '--no-diagnose' amd '--diagnose=none'.
> 
>>
>> The reason to have these things is basically so users can create
>> aliases (say 'git br' expands to 'git bugreport --diagnose=all', but
>> they want to run 'git br --no-diagnose' to clear that --diagnose=all).
> 
> I considered that usage ("'--diagnose=none' would only be used to override
> another '--diagnose' specification in the same command/alias"), but wasn't
> sure how common it would be for this particular option. It sounds like you
> can see it being useful, so I'll include '--diagnose=none' in the next
> version.

This change would make it easier to add a config option in the future,
though I doubt we will need it as 'git bugreport' should be used too
infrequently to want to set up such config in advance.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'
  2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
                     ` (10 preceding siblings ...)
  2022-08-04 17:22   ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Junio C Hamano
@ 2022-08-10 23:34   ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
                       ` (11 more replies)
  11 siblings, 12 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

As part of the preparation for moving Scalar out of 'contrib/' and into Git,
this series moves the functionality of 'scalar diagnose' into a new builtin
('git diagnose') and a new option ('--diagnose') for 'git bugreport'. This
change further aligns Scalar with the objective [1] of having it only
contain functionality and settings that benefit large Git repositories, but
not all repositories. The diagnostics reported by 'scalar diagnose' relevant
for investigating issues in any Git repository, so generating them should be
part of a "normal" Git builtin.

The series is organized as follows:

 * Miscellaneous fixes for the existing 'scalar diagnose' implementation
 * Moving the code for generating diagnostics into a common location in the
   Git tree
 * Implementing 'git diagnose'
 * Implementing 'git bugreport --diagnose'
 * Updating the Scalar roadmap

Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
'git bugreport --diagnose', it is not being deprecated in this series.
Although deprecation -> removal could be a future cleanup effort, 'scalar
diagnose' is kept around for now as an alias for users already accustomed to
using it in 'scalar'.


Changes since V2
================

 * Replaced 'int include_everything' arg to 'create_diagnostic_archive()'
   with 'enum diagnose_mode mode'.
 * Replaced '--all' with configurable '--mode' option in 'git diagnose';
   moved 'option_parse_diagnose()' into 'diagnose.c' so that it can be used
   for both 'git bugreport --diagnose' and 'git diagnose --mode'.
 * Split "builtin/diagnose.c: gate certain data behind '--all'" (formerly
   patch 7/10) into "diagnose.c: add option to configure archive contents"
   (patch 6/11) and "builtin/diagnose.c: add '--mode' option" (patch 8/11).
 * Added '--no-diagnose' for 'git bugreport'. I was initially going to use
   '--diagnose=none', but '--no-diagnose' was easier to configure when using
   the shared 'option_parse_diagnose()' function .
 * Updated usage strings, option descriptions, and documentation files for
   'mode' option. To avoid needing to keep multiple lists of valid 'mode'
   values up-to-date, format mode value as <mode> everywhere except option
   description in 'git-diagnose.txt', where the values are listed. The
   documentation of '--diagnose' in 'git-bugreport.txt' links to
   'git-diagnose.txt' and explicitly calls out that details on 'mode' can be
   found there.
 * Reworded 'git diagnose' and 'git bugreport' command & option
   documentation.
 * Added additional checks to 't0091-bugreport.sh' and 't0092-diagnose.sh'
   tests
 * Moved '#include "cache.h" from 'diagnose.h' to 'diagnose.c'.
 * Fixed '--output-directory' usage string in 'builtin/diagnose.c'.
 * Replaced 'die()' with 'die_errno()' in error triggered when leading
   directories of archive cannot be created.
 * Changed hardcoded '-1' error exit code in 'scalar diagnose' to returning
   the exit code from 'git diagnose --mode=all'.


Changes since V1
================

 * Reorganized patches to fix minor issues (e.g., more gently adding
   directories to the archive) of 'scalar diagnose' in 'scalar.c', before
   the code is moved out of that file.
 * (Almost) entirely redesigned the UI for generating diagnostics. The new
   approach avoids cluttering 'git bugreport' with a mode that doesn't
   actually generate a report. Now, there are distinct options for different
   use cases: generating extra diagnostics with a bug report ('git bugreport
   --diagnose') and generating diagnostics for personal debugging/addition
   to an existing bug report ('git diagnose').
 * Moved 'get_disk_info()' into 'compat/'.
 * Moved 'create_diagnostics_archive()' into a new 'diagnose.c', as it now
   has multiple callers.
 * Updated command & option documentation to more clearly guide users on how
   to use the new options.
 * Added the '--all' (and '--diagnose=all') option to change the default
   behavior of diagnostics generation to exclude '.git' directory contents.
   For many bug reporters, this would reveal private repository contents
   they don't want to expose to the public mailing list. This has the added
   benefit of creating much smaller archives by default, which will be more
   likely to successfully send to the mailing list.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/pull.1275.v2.git.1657584367.gitgitgadget@gmail.com/

Victoria Dye (11):
  scalar-diagnose: use "$GIT_UNZIP" in test
  scalar-diagnose: avoid 32-bit overflow of size_t
  scalar-diagnose: add directory to archiver more gently
  scalar-diagnose: move 'get_disk_info()' to 'compat/'
  scalar-diagnose: move functionality to common location
  diagnose.c: add option to configure archive contents
  builtin/diagnose.c: create 'git diagnose' builtin
  builtin/diagnose.c: add '--mode' option
  builtin/bugreport.c: create '--diagnose' option
  scalar-diagnose: use 'git diagnose --mode=all'
  scalar: update technical doc roadmap

 .gitignore                         |   1 +
 Documentation/git-bugreport.txt    |  18 ++
 Documentation/git-diagnose.txt     |  65 +++++++
 Documentation/technical/scalar.txt |   9 +-
 Makefile                           |   2 +
 builtin.h                          |   1 +
 builtin/bugreport.c                |  27 ++-
 builtin/diagnose.c                 |  61 +++++++
 compat/disk.h                      |  56 ++++++
 contrib/scalar/scalar.c            | 271 +----------------------------
 contrib/scalar/t/t9099-scalar.sh   |   8 +-
 diagnose.c                         | 254 +++++++++++++++++++++++++++
 diagnose.h                         |  17 ++
 git-compat-util.h                  |   1 +
 git.c                              |   1 +
 t/t0091-bugreport.sh               |  48 +++++
 t/t0092-diagnose.sh                |  60 +++++++
 17 files changed, 622 insertions(+), 278 deletions(-)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100644 compat/disk.h
 create mode 100644 diagnose.c
 create mode 100644 diagnose.h
 create mode 100755 t/t0092-diagnose.sh


base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1310%2Fvdye%2Fscalar%2Fgeneralize-diagnose-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1310/vdye/scalar/generalize-diagnose-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1310

Range-diff vs v2:

  1:  ad5b60bf11e =  1:  f5ceb9c7190 scalar-diagnose: use "$GIT_UNZIP" in test
  2:  7956dc24b30 =  2:  78a93eb95bb scalar-diagnose: avoid 32-bit overflow of size_t
  3:  23349bfaf8f =  3:  22ee8ea5a1e scalar-diagnose: add directory to archiver more gently
  4:  05bba1e699f =  4:  18f2ba4e0cd scalar-diagnose: move 'get_disk_info()' to 'compat/'
  5:  3a0cb33c658 !  5:  7a51fad87a8 scalar-diagnose: move functionality to common location
     @@ Commit message
          primary changes being that 'zip_path' is an input and "Enlistment root" is
          corrected to "Repository root" in the archiver log.
      
     +    Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Makefile ##
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
      
       ## diagnose.c (new) ##
      @@
     ++#include "cache.h"
      +#include "diagnose.h"
      +#include "compat/disk.h"
      +#include "archive.h"
     @@ diagnose.h (new)
      +#ifndef DIAGNOSE_H
      +#define DIAGNOSE_H
      +
     -+#include "cache.h"
      +#include "strbuf.h"
      +
      +int create_diagnostics_archive(struct strbuf *zip_path);
  -:  ----------- >  6:  0a6c55696d8 diagnose.c: add option to configure archive contents
  6:  73e139ee377 !  7:  bf3c073a985 builtin/diagnose.c: create 'git diagnose' builtin
     @@ Commit message
          diagnostics gathered are not specific to Scalar-cloned repositories and
          can be useful when diagnosing issues in any Git repository.
      
     +    Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## .gitignore ##
     @@ Documentation/git-diagnose.txt (new)
      +    stores
      +  * The total count of loose objects, as well as counts broken down by
      +    `.git/objects` subdirectory
     -+  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
     -+    `.git/objects/info` directories
      +
      +This tool differs from linkgit:git-bugreport[1] in that it collects much more
      +detailed information with a greater focus on reporting the size and data shape
     @@ builtin/diagnose.c (new)
      +#include "parse-options.h"
      +#include "diagnose.h"
      +
     -+
      +static const char * const diagnose_usage[] = {
     -+	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
     ++	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
      +	NULL
      +};
      +
     @@ builtin/diagnose.c (new)
      +	case SCLD_EXISTS:
      +		break;
      +	default:
     -+		die(_("could not create leading directories for '%s'"),
     -+		    zip_path.buf);
     ++		die_errno(_("could not create leading directories for '%s'"),
     ++			  zip_path.buf);
      +	}
      +
      +	/* Prepare diagnostics */
     -+	if (create_diagnostics_archive(&zip_path))
     ++	if (create_diagnostics_archive(&zip_path, DIAGNOSE_STATS))
      +		die_errno(_("unable to create diagnostics archive %s"),
      +			  zip_path.buf);
      +
     @@ t/t0092-diagnose.sh (new)
      +	test_when_finished rm -rf report &&
      +
      +	git diagnose -o report -s test >out &&
     ++	grep "Available space" out &&
      +
      +	zip_path=report/git-diagnostics-test.zip &&
     -+	grep "Available space" out &&
      +	test_path_is_file "$zip_path" &&
      +
      +	# Check zipped archive content
     @@ t/t0092-diagnose.sh (new)
      +	grep ".git/objects" out &&
      +
      +	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
     ++	grep "^Total: [0-9][0-9]*" out &&
     ++
     ++	# Should not include .git directory contents by default
     ++	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
      +	grep "^Total: [0-9][0-9]*" out
      +'
      +
  7:  a3e62a4a041 !  8:  3da0cb725c9 builtin/diagnose.c: gate certain data behind '--all'
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    builtin/diagnose.c: gate certain data behind '--all'
     +    builtin/diagnose.c: add '--mode' option
      
     -    Update 'git diagnose' to *not* include '.git/' directory contents by
     -    default, instead requiring specification of a '--all' option to include it.
     -    While helpful for debugging, the archived '.git/' directory contents may be
     -    sensitive, as they can be used to reconstruct an entire repository.
     +    Create '--mode=<mode>' option in 'git diagnose' to allow users to optionally
     +    select non-default diagnostic information to include in the output archive.
     +    Additionally, document the currently-available modes, emphasizing the
     +    importance of not sharing a '--mode=all' archive publicly due to the
     +    presence of sensitive information.
      
     -    To guard against users inadvertently including this information in
     -    diagnostics and sharing it (e.g., with the mailing list), '.git/' directory
     -    contents will only be included if '--all' is specified.
     +    Note that the option parsing callback - 'option_parse_diagnose()' - is added
     +    to 'diagnose.c' rather than 'builtin/diagnose.c' so that it may be reused in
     +    future callers configuring a diagnostics archive.
      
     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Documentation/git-diagnose.txt ##
     @@ Documentation/git-diagnose.txt: SYNOPSIS
       --------
       [verse]
       'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
     -+	       [-a | --all]
     ++	       [--mode=<mode>]
       
       DESCRIPTION
       -----------
     +@@ Documentation/git-diagnose.txt: repository state and packages that information into a zip archive. The
     + generated archive can then, for example, be shared with the Git mailing list to
     + help debug an issue or serve as a reference for independent debugging.
     + 
     +-The following information is captured in the archive:
     ++By default, the following information is captured in the archive:
     + 
     +   * 'git version --build-options'
     +   * The path to the repository root
      @@ Documentation/git-diagnose.txt: The following information is captured in the archive:
     -     stores
         * The total count of loose objects, as well as counts broken down by
           `.git/objects` subdirectory
     --  * The contents of the `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
     --    `.git/objects/info` directories
       
     ++Additional information can be collected by selecting a different diagnostic mode
     ++using the `--mode` option.
     ++
       This tool differs from linkgit:git-bugreport[1] in that it collects much more
       detailed information with a greater focus on reporting the size and data shape
     + of repository contents.
      @@ Documentation/git-diagnose.txt: OPTIONS
       	form of a strftime(3) format string; the current local time will be
       	used.
       
     -+-a::
     -+--all::
     -+	Include more complete repository diagnostic information in the archive.
     -+	Specifically, this will add copies of `.git`, `.git/hooks`, `.git/info`,
     -+	`.git/logs`, and `.git/objects/info` directories to the output archive.
     -+	This additional data may be sensitive; a user can reconstruct the full
     -+	contents of the diagnosed repository with this information. Users should
     -+	exercise caution when sharing an archive generated with this option.
     ++--mode=(stats|all)::
     ++	Specify the type of diagnostics that should be collected. The default behavior
     ++	of 'git diagnose' is equivalent to `--mode=stats`.
     +++
     ++The `--mode=all` option collects everything included in `--mode=stats`, as well
     ++as copies of `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
     ++`.git/objects/info` directories. This additional information may be sensitive,
     ++as it can be used to reconstruct the full contents of the diagnosed repository.
     ++Users should exercise caution when sharing an archive generated with
     ++`--mode=all`.
      +
       GIT
       ---
     @@ Documentation/git-diagnose.txt: OPTIONS
      
       ## builtin/diagnose.c ##
      @@
     - 
     + #include "diagnose.h"
       
       static const char * const diagnose_usage[] = {
     --	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
     -+	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>] [-a|--all]"),
     +-	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
     ++	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>] [--mode=<mode>]"),
       	NULL
       };
       
     @@ builtin/diagnose.c: int cmd_diagnose(int argc, const char **argv, const char *pr
       	struct strbuf zip_path = STRBUF_INIT;
       	time_t now = time(NULL);
       	struct tm tm;
     -+	int include_everything = 0;
     ++	enum diagnose_mode mode = DIAGNOSE_STATS;
       	char *option_output = NULL;
       	char *option_suffix = "%Y-%m-%d-%H%M";
       	char *prefixed_filename;
     @@ builtin/diagnose.c: int cmd_diagnose(int argc, const char **argv, const char *pr
       			   N_("specify a destination for the diagnostics archive")),
       		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
       			   N_("specify a strftime format suffix for the filename")),
     -+		OPT_BOOL_F('a', "all", &include_everything,
     -+			   N_("collect complete diagnostic information"),
     -+			   PARSE_OPT_NONEG),
     ++		OPT_CALLBACK_F(0, "mode", &mode, N_("(stats|all)"),
     ++			       N_("specify the content of the diagnostic archive"),
     ++			       PARSE_OPT_NONEG, option_parse_diagnose),
       		OPT_END()
       	};
       
     @@ builtin/diagnose.c: int cmd_diagnose(int argc, const char **argv, const char *pr
       	}
       
       	/* Prepare diagnostics */
     --	if (create_diagnostics_archive(&zip_path))
     -+	if (create_diagnostics_archive(&zip_path, include_everything))
     +-	if (create_diagnostics_archive(&zip_path, DIAGNOSE_STATS))
     ++	if (create_diagnostics_archive(&zip_path, mode))
       		die_errno(_("unable to create diagnostics archive %s"),
       			  zip_path.buf);
       
      
     - ## contrib/scalar/scalar.c ##
     -@@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 		goto diagnose_cleanup;
     - 	}
     - 
     --	res = create_diagnostics_archive(&zip_path);
     -+	res = create_diagnostics_archive(&zip_path, 1);
     - 
     - diagnose_cleanup:
     - 	strbuf_release(&zip_path);
     -
       ## diagnose.c ##
     -@@ diagnose.c: static int add_directory_to_archiver(struct strvec *archiver_args,
     - 	return res;
     - }
     +@@
     + #include "object-store.h"
     + #include "packfile.h"
       
     --int create_diagnostics_archive(struct strbuf *zip_path)
     -+int create_diagnostics_archive(struct strbuf *zip_path, int include_everything)
     ++struct diagnose_option {
     ++	enum diagnose_mode mode;
     ++	const char *option_name;
     ++};
     ++
     ++static struct diagnose_option diagnose_options[] = {
     ++	{ DIAGNOSE_STATS, "stats" },
     ++	{ DIAGNOSE_ALL, "all" },
     ++};
     ++
     ++int option_parse_diagnose(const struct option *opt, const char *arg, int unset)
     ++{
     ++	int i;
     ++	enum diagnose_mode *diagnose = opt->value;
     ++
     ++	if (!arg) {
     ++		*diagnose = unset ? DIAGNOSE_NONE : DIAGNOSE_STATS;
     ++		return 0;
     ++	}
     ++
     ++	for (i = 0; i < ARRAY_SIZE(diagnose_options); i++) {
     ++		if (!strcmp(arg, diagnose_options[i].option_name)) {
     ++			*diagnose = diagnose_options[i].mode;
     ++			return 0;
     ++		}
     ++	}
     ++
     ++	return error(_("invalid --%s value '%s'"), opt->long_name, arg);
     ++}
     ++
     + static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
     + 				   const char *file_name, void *data)
       {
     - 	struct strvec archiver_args = STRVEC_INIT;
     - 	char **argv_copy = NULL;
     -@@ diagnose.c: int create_diagnostics_archive(struct strbuf *zip_path)
     - 	loose_objs_stats(&buf, ".git/objects");
     - 	strvec_push(&archiver_args, buf.buf);
     - 
     --	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     --	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     --	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     --	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     --	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
     -+	/* Only include this if explicitly requested */
     -+	if (include_everything &&
     -+	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
     - 		goto diagnose_cleanup;
     - 
     - 	strvec_pushl(&archiver_args, "--prefix=",
      
       ## diagnose.h ##
     -@@
     - #include "cache.h"
     - #include "strbuf.h"
     +@@ diagnose.h: enum diagnose_mode {
     + 	DIAGNOSE_ALL
     + };
       
     --int create_diagnostics_archive(struct strbuf *zip_path);
     -+int create_diagnostics_archive(struct strbuf *zip_path, int include_everything);
     ++int option_parse_diagnose(const struct option *opt, const char *arg, int unset);
     ++
     + int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode);
       
       #endif /* DIAGNOSE_H */
      
       ## t/t0092-diagnose.sh ##
      @@ t/t0092-diagnose.sh: test_expect_success UNZIP 'creates diagnostics zip archive' '
     - 	grep ".git/objects" out &&
       
     - 	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
     + 	# Should not include .git directory contents by default
     + 	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
      -	grep "^Total: [0-9][0-9]*" out
     -+	grep "^Total: [0-9][0-9]*" out &&
     ++'
     ++
     ++test_expect_success UNZIP '--mode=stats excludes .git dir contents' '
     ++	test_when_finished rm -rf report &&
      +
     -+	# Should not include .git directory contents
     ++	git diagnose -o report -s test --mode=stats >out &&
     ++
     ++	# Includes pack quantity/size info
     ++	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
     ++	grep ".git/objects" out &&
     ++
     ++	# Does not include .git directory contents
      +	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
      +'
      +
     -+test_expect_success UNZIP '--all includes .git data in archive' '
     ++test_expect_success UNZIP '--mode=all includes .git dir contents' '
      +	test_when_finished rm -rf report &&
      +
     -+	git diagnose -o report -s test --all >out &&
     ++	git diagnose -o report -s test --mode=all >out &&
     ++
     ++	# Includes pack quantity/size info
     ++	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
     ++	grep ".git/objects" out &&
      +
     -+	# Should include .git directory contents
     ++	# Includes .git directory contents
      +	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
      +
      +	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
  8:  d81e7c10997 !  9:  1a1eb2c9806 builtin/bugreport.c: create '--diagnose' option
     @@ Commit message
          provide additional context to readers, ideally reducing some back-and-forth
          between reporters and those debugging the issue.
      
     -    Note that '--diagnose' may take an optional string arg (either 'basic' or
     -    'all'). If specified without the arg or with 'basic', the behavior
     -    corresponds to running 'git diagnose' without '--all'; this default is meant
     -    to help reduce unintentional leaking of sensitive information). However, a
     -    user can still manually specify '--diagnose=all' to generate the equivalent
     -    archive to one created with 'git diagnose --all'.
     +    Note that '--diagnose' may take an optional string arg (either 'stats' or
     +    'all'). If specified without the arg, the behavior corresponds to running
     +    'git diagnose' without '--mode'. As with 'git diagnose', this default is
     +    intended to help reduce unintentional leaking of sensitive information).
     +    Users can also explicitly specify '--diagnose=(stats|all)' to generate the
     +    respective archive created by 'git diagnose --mode=(stats|all)'.
      
          Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Documentation/git-bugreport.txt ##
     @@ Documentation/git-bugreport.txt: SYNOPSIS
       --------
       [verse]
       'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
     -+		[--diagnose[=(basic|all)]]
     ++		[--diagnose[=<mode>]]
       
       DESCRIPTION
       -----------
     @@ Documentation/git-bugreport.txt: OPTIONS
       	named 'git-bugreport-<formatted suffix>'. This should take the form of a
       	strftime(3) format string; the current local time will be used.
       
     -+--diagnose[=(basic|all)]::
     -+	Create a zip archive of information about the repository including logs
     -+	and certain statistics describing the data shape of the repository. The
     -+	archive is written to the same output directory as the bug report and is
     -+	named 'git-diagnostics-<formatted suffix>'.
     ++--no-diagnose::
     ++--diagnose[=<mode>]::
     ++	Create a zip archive of supplemental information about the user's
     ++	machine, Git client, and repository state. The archive is written to the
     ++	same output directory as the bug report and is named
     ++	'git-diagnostics-<formatted suffix>'.
      ++
     -+By default, `--diagnose` (equivalent to `--diagnose=basic`) will collect only
     -+statistics and summarized data about the repository and filesystem. Specifying
     -+`--diagnose=all` will create an archive with the same contents generated by `git
     -+diagnose --all`; this archive will be much larger, and will contain potentially
     -+sensitive information about the repository. See linkgit:git-diagnose[1] for more
     -+details on the contents of the diagnostic archive.
     ++Without `mode` specified, the diagnostic archive will contain the default set of
     ++statistics reported by `git diagnose`. An optional `mode` value may be specified
     ++to change which information is included in the archive. See
     ++linkgit:git-diagnose[1] for the list of valid values for `mode` and details
     ++about their usage.
      +
       GIT
       ---
     @@ builtin/bugreport.c
       #include "hook-list.h"
      +#include "diagnose.h"
       
     -+enum diagnose_mode {
     -+	DIAGNOSE_NONE,
     -+	DIAGNOSE_BASIC,
     -+	DIAGNOSE_ALL
     -+};
       
       static void get_system_info(struct strbuf *sys_info)
     - {
     -@@ builtin/bugreport.c: static void get_header(struct strbuf *buf, const char *title)
     - 	strbuf_addf(buf, "\n\n[%s]\n", title);
     +@@ builtin/bugreport.c: static void get_populated_hooks(struct strbuf *hook_info, int nongit)
       }
       
     -+static int option_parse_diagnose(const struct option *opt,
     -+				 const char *arg, int unset)
     -+{
     -+	enum diagnose_mode *diagnose = opt->value;
     -+
     -+	BUG_ON_OPT_NEG(unset);
     -+
     -+	if (!arg || !strcmp(arg, "basic"))
     -+		*diagnose = DIAGNOSE_BASIC;
     -+	else if (!strcmp(arg, "all"))
     -+		*diagnose = DIAGNOSE_ALL;
     -+	else
     -+		die(_("diagnose mode must be either 'basic' or 'all'"));
     -+
     -+	return 0;
     -+}
     -+
     - int cmd_bugreport(int argc, const char **argv, const char *prefix)
     - {
     - 	struct strbuf buffer = STRBUF_INIT;
     + static const char * const bugreport_usage[] = {
     +-	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
     ++	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>] [--diagnose[=<mode>]"),
     + 	NULL
     + };
     + 
      @@ builtin/bugreport.c: int cmd_bugreport(int argc, const char **argv, const char *prefix)
       	int report = -1;
       	time_t now = time(NULL);
     @@ builtin/bugreport.c: int cmd_bugreport(int argc, const char **argv, const char *
      +	size_t output_path_len;
       
       	const struct option bugreport_options[] = {
     -+		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("(basic|all)"),
     -+			       N_("create an additional zip archive of detailed diagnostics"),
     -+			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG, option_parse_diagnose),
     ++		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("mode"),
     ++			       N_("create an additional zip archive of detailed diagnostics (default 'stats')"),
     ++			       PARSE_OPT_OPTARG, option_parse_diagnose),
       		OPT_STRING('o', "output-directory", &option_output, N_("path"),
      -			   N_("specify a destination for the bugreport file")),
      +			   N_("specify a destination for the bugreport file(s)")),
     @@ builtin/bugreport.c: int cmd_bugreport(int argc, const char **argv, const char *
      +		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
      +		strbuf_addstr(&zip_path, ".zip");
      +
     -+		if (create_diagnostics_archive(&zip_path, diagnose == DIAGNOSE_ALL))
     ++		if (create_diagnostics_archive(&zip_path, diagnose))
      +			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
      +
      +		strbuf_release(&zip_path);
     @@ t/t0091-bugreport.sh: test_expect_success 'indicates populated hooks' '
      +	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
      +'
      +
     -+test_expect_success UNZIP '--diagnose=basic excludes .git dir contents' '
     ++test_expect_success UNZIP '--diagnose=stats excludes .git dir contents' '
      +	test_when_finished rm -rf report &&
      +
     -+	git bugreport --diagnose=basic -o report -s test >out &&
     ++	git bugreport --diagnose=stats -o report -s test >out &&
     ++
     ++	# Includes pack quantity/size info
     ++	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
     ++	grep ".git/objects" out &&
      +
     -+	# Should not include .git directory contents
     ++	# Does not include .git directory contents
      +	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
      +'
      +
     @@ t/t0091-bugreport.sh: test_expect_success 'indicates populated hooks' '
      +
      +	git bugreport --diagnose=all -o report -s test >out &&
      +
     -+	# Should include .git directory contents
     ++	# Includes .git directory contents
      +	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
      +
      +	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
  9:  6834bdcaea8 ! 10:  d22674752f0 scalar-diagnose: use 'git diagnose --all'
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    scalar-diagnose: use 'git diagnose --all'
     +    scalar-diagnose: use 'git diagnose --mode=all'
      
          Replace implementation of 'scalar diagnose' with an internal invocation of
     -    'git diagnose --all'. This simplifies the implementation of 'cmd_diagnose'
     -    by making it a direct alias of 'git diagnose' and removes some code in
     -    'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the
     -    alias also sets up a clean deprecation path for 'scalar diagnose' (in favor
     -    of 'git diagnose'), if that is desired in the future.
     +    'git diagnose --mode=all'. This simplifies the implementation of
     +    'cmd_diagnose' by making it a direct alias of 'git diagnose' and removes
     +    some code in 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The
     +    simplicity of the alias also sets up a clean deprecation path for 'scalar
     +    diagnose' (in favor of 'git diagnose'), if that is desired in the future.
      
          This introduces one minor change to the output of 'scalar diagnose', which
          is that the prefix of the created zip archive is changed from 'scalar_' to
          'git-diagnostics-'.
      
     +    Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## contrib/scalar/scalar.c ##
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
      +	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
      +	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
       
     --	res = create_diagnostics_archive(&zip_path, 1);
     -+	if (run_git("diagnose", "--all", "-s", "%Y%m%d_%H%M%S",
     -+		    "-o", diagnostics_root.buf, NULL) < 0)
     -+		res = -1;
     +-	res = create_diagnostics_archive(&zip_path, DIAGNOSE_ALL);
     ++	res = run_git("diagnose", "--mode=all", "-s", "%Y%m%d_%H%M%S",
     ++		      "-o", diagnostics_root.buf, NULL);
       
      -diagnose_cleanup:
      -	strbuf_release(&zip_path);
 10:  14925c3feed = 11:  b64475f5b17 scalar: update technical doc roadmap

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 01/11] scalar-diagnose: use "$GIT_UNZIP" in test
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 02/11] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
tests on systems where 'unzip' is not in the system path.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/t/t9099-scalar.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 10b1172a8aa..fac86a57550 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -109,14 +109,14 @@ test_expect_success UNZIP 'scalar diagnose' '
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-	unzip -v "$zip_path" &&
+	"$GIT_UNZIP" -v "$zip_path" &&
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
-	unzip -p "$zip_path" diagnostics.log >out &&
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
-	unzip -p "$zip_path" packs-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
 	grep "$(pwd)/.git/objects" out &&
-	unzip -p "$zip_path" objects-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
 	grep "^Total: [1-9]" out
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 02/11] scalar-diagnose: avoid 32-bit overflow of size_t
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 03/11] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Avoid 32-bit size_t overflow when reporting the available disk space in
'get_disk_info' by casting the block size and available block count to
'off_t' before multiplying them. Without this change, 'st_mult' would
(correctly) report a size_t overflow on 32-bit systems at or exceeding 2^32
bytes of available space.

Note that 'off_t' is a 64-bit integer even on 32-bit systems due to the
inclusion of '#define _FILE_OFFSET_BITS 64' in 'git-compat-util.h' (see
b97e911643 (Support for large files on 32bit systems., 2007-02-17)).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 97e71fe19cd..04046452284 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -348,7 +348,7 @@ static int get_disk_info(struct strbuf *out)
 	}
 
 	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
 	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
 	strbuf_release(&buf);
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 03/11] scalar-diagnose: add directory to archiver more gently
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 02/11] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
                       ` (8 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

If a directory added to the 'scalar diagnose' archiver does not exist, warn
and return 0 from 'add_directory_to_archiver()' rather than failing with a
fatal error. This handles a failure edge case where the '.git/logs' has not
yet been created when running 'scalar diagnose', but extends to any
situation where a directory may be missing in the '.git' dir.

Now, when a directory is missing a warning is captured in the diagnostic
logs. This provides a user with more complete information than if 'scalar
diagnose' simply failed with an error.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 04046452284..b9092f0b612 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 					  const char *path, int recurse)
 {
 	int at_root = !*path;
-	DIR *dir = opendir(at_root ? "." : path);
+	DIR *dir;
 	struct dirent *e;
 	struct strbuf buf = STRBUF_INIT;
 	size_t len;
 	int res = 0;
 
-	if (!dir)
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
 		return error_errno(_("could not open directory '%s'"), path);
+	}
 
 	if (!at_root)
 		strbuf_addf(&buf, "%s/", path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/'
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 03/11] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
                       ` (7 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Move 'get_disk_info()' function into 'compat/'. Although Scalar-specific
code is generally not part of the main Git tree, 'get_disk_info()' will be
used in subsequent patches by additional callers beyond 'scalar diagnose'.
This patch prepares for that change, at which point this platform-specific
code should be part of 'compat/' as a matter of convention.

The function is copied *mostly* verbatim, with two exceptions:

* '#ifdef WIN32' is replaced with '#ifdef GIT_WINDOWS_NATIVE' to allow
  'statvfs' to be used with Cygwin.
* the 'struct strbuf buf' and 'int res' (as well as their corresponding
  cleanup & return) are moved outside of the '#ifdef' block.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 compat/disk.h           | 56 +++++++++++++++++++++++++++++++++++++++++
 contrib/scalar/scalar.c | 53 +-------------------------------------
 git-compat-util.h       |  1 +
 3 files changed, 58 insertions(+), 52 deletions(-)
 create mode 100644 compat/disk.h

diff --git a/compat/disk.h b/compat/disk.h
new file mode 100644
index 00000000000..50a32e3d8a4
--- /dev/null
+++ b/compat/disk.h
@@ -0,0 +1,56 @@
+#ifndef COMPAT_DISK_H
+#define COMPAT_DISK_H
+
+#include "git-compat-util.h"
+
+static int get_disk_info(struct strbuf *out)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int res = 0;
+
+#ifdef GIT_WINDOWS_NATIVE
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+#else
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+#endif
+
+cleanup:
+	strbuf_release(&buf);
+	return res;
+}
+
+#endif /* COMPAT_DISK_H */
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index b9092f0b612..607fedefd82 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -13,6 +13,7 @@
 #include "help.h"
 #include "archive.h"
 #include "object-store.h"
+#include "compat/disk.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -309,58 +310,6 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
-#ifndef WIN32
-#include <sys/statvfs.h>
-#endif
-
-static int get_disk_info(struct strbuf *out)
-{
-#ifdef WIN32
-	struct strbuf buf = STRBUF_INIT;
-	char volume_name[MAX_PATH], fs_name[MAX_PATH];
-	DWORD serial_number, component_length, flags;
-	ULARGE_INTEGER avail2caller, total, avail;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
-		error(_("could not determine free disk size for '%s'"),
-		      buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_setlen(&buf, offset_1st_component(buf.buf));
-	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
-				   &serial_number, &component_length, &flags,
-				   fs_name, sizeof(fs_name))) {
-		error(_("could not get info for '%s'"), buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, avail2caller.QuadPart);
-	strbuf_addch(out, '\n');
-	strbuf_release(&buf);
-#else
-	struct strbuf buf = STRBUF_INIT;
-	struct statvfs stat;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (statvfs(buf.buf, &stat) < 0) {
-		error_errno(_("could not determine free disk size for '%s'"),
-			    buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
-	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
-	strbuf_release(&buf);
-#endif
-	return 0;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
diff --git a/git-compat-util.h b/git-compat-util.h
index 58d7708296b..9a62e3a0d2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -258,6 +258,7 @@ static inline int is_xplatform_dir_sep(int c)
 #include <sys/resource.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/statvfs.h>
 #include <termios.h>
 #ifndef NO_SYS_SELECT_H
 #include <sys/select.h>
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 05/11] scalar-diagnose: move functionality to common location
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
                       ` (6 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Move the core functionality of 'scalar diagnose' into a new 'diagnose.[c,h]'
library to prepare for new callers in the main Git tree generating
diagnostic archives. These callers will be introduced in subsequent patches.

While this patch appears large, it is mostly made up of moving code out of
'scalar.c' and into 'diagnose.c'. Specifically, the functions

- dir_file_stats_objects()
- dir_file_stats()
- count_files()
- loose_objs_stats()
- add_directory_to_archiver()

are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()'
function is a mostly identical (partial) copy of 'cmd_diagnose()', with the
primary changes being that 'zip_path' is an input and "Enlistment root" is
corrected to "Repository root" in the archiver log.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                |   1 +
 contrib/scalar/scalar.c | 202 +------------------------------------
 diagnose.c              | 217 ++++++++++++++++++++++++++++++++++++++++
 diagnose.h              |   8 ++
 4 files changed, 228 insertions(+), 200 deletions(-)
 create mode 100644 diagnose.c
 create mode 100644 diagnose.h

diff --git a/Makefile b/Makefile
index 2ec9b2dc6bb..ed66cb70e5a 100644
--- a/Makefile
+++ b/Makefile
@@ -932,6 +932,7 @@ LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += delta-islands.o
+LIB_OBJS += diagnose.o
 LIB_OBJS += diff-delta.o
 LIB_OBJS += diff-merges.o
 LIB_OBJS += diff-lib.o
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 607fedefd82..3983def760a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,9 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "archive.h"
-#include "object-store.h"
-#include "compat/disk.h"
+#include "diagnose.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -263,53 +261,6 @@ static int unregister_dir(void)
 	return res;
 }
 
-static int add_directory_to_archiver(struct strvec *archiver_args,
-					  const char *path, int recurse)
-{
-	int at_root = !*path;
-	DIR *dir;
-	struct dirent *e;
-	struct strbuf buf = STRBUF_INIT;
-	size_t len;
-	int res = 0;
-
-	dir = opendir(at_root ? "." : path);
-	if (!dir) {
-		if (errno == ENOENT) {
-			warning(_("could not archive missing directory '%s'"), path);
-			return 0;
-		}
-		return error_errno(_("could not open directory '%s'"), path);
-	}
-
-	if (!at_root)
-		strbuf_addf(&buf, "%s/", path);
-	len = buf.len;
-	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
-
-	while (!res && (e = readdir(dir))) {
-		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
-			continue;
-
-		strbuf_setlen(&buf, len);
-		strbuf_addstr(&buf, e->d_name);
-
-		if (e->d_type == DT_REG)
-			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
-		else if (e->d_type != DT_DIR)
-			warning(_("skipping '%s', which is neither file nor "
-				  "directory"), buf.buf);
-		else if (recurse &&
-			 add_directory_to_archiver(archiver_args,
-						   buf.buf, recurse) < 0)
-			res = -1;
-	}
-
-	closedir(dir);
-	strbuf_release(&buf);
-	return res;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -550,83 +501,6 @@ cleanup:
 	return res;
 }
 
-static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
-				   const char *file_name, void *data)
-{
-	struct strbuf *buf = data;
-	struct stat st;
-
-	if (!stat(full_path, &st))
-		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
-			    (uintmax_t)st.st_size);
-}
-
-static int dir_file_stats(struct object_directory *object_dir, void *data)
-{
-	struct strbuf *buf = data;
-
-	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
-
-	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
-				  data);
-
-	return 0;
-}
-
-static int count_files(char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count = 0;
-
-	if (!dir)
-		return 0;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
-			count++;
-
-	closedir(dir);
-	return count;
-}
-
-static void loose_objs_stats(struct strbuf *buf, const char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count;
-	int total = 0;
-	unsigned char c;
-	struct strbuf count_path = STRBUF_INIT;
-	size_t base_path_len;
-
-	if (!dir)
-		return;
-
-	strbuf_addstr(buf, "Object directory stats for ");
-	strbuf_add_absolute_path(buf, path);
-	strbuf_addstr(buf, ":\n");
-
-	strbuf_add_absolute_path(&count_path, path);
-	strbuf_addch(&count_path, '/');
-	base_path_len = count_path.len;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) &&
-		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
-		    !hex_to_bytes(&c, e->d_name, 1)) {
-			strbuf_setlen(&count_path, base_path_len);
-			strbuf_addstr(&count_path, e->d_name);
-			total += (count = count_files(count_path.buf));
-			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
-		}
-
-	strbuf_addf(buf, "Total: %d loose objects", total);
-
-	strbuf_release(&count_path);
-	closedir(dir);
-}
-
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -637,12 +511,8 @@ static int cmd_diagnose(int argc, const char **argv)
 		NULL
 	};
 	struct strbuf zip_path = STRBUF_INIT;
-	struct strvec archiver_args = STRVEC_INIT;
-	char **argv_copy = NULL;
-	int stdout_fd = -1, archiver_fd = -1;
 	time_t now = time(NULL);
 	struct tm tm;
-	struct strbuf buf = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
@@ -663,79 +533,11 @@ static int cmd_diagnose(int argc, const char **argv)
 			    zip_path.buf);
 		goto diagnose_cleanup;
 	}
-	stdout_fd = dup(1);
-	if (stdout_fd < 0) {
-		res = error_errno(_("could not duplicate stdout"));
-		goto diagnose_cleanup;
-	}
-
-	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
-	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
-		res = error_errno(_("could not redirect output"));
-		goto diagnose_cleanup;
-	}
-
-	init_zip_archiver();
-	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
-	get_version_info(&buf, 1);
-
-	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
-	get_disk_info(&buf);
-	write_or_die(stdout_fd, buf.buf, buf.len);
-	strvec_pushf(&archiver_args,
-		     "--add-virtual-file=diagnostics.log:%.*s",
-		     (int)buf.len, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
-	dir_file_stats(the_repository->objects->odb, &buf);
-	foreach_alt_odb(dir_file_stats, &buf);
-	strvec_push(&archiver_args, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
-	loose_objs_stats(&buf, ".git/objects");
-	strvec_push(&archiver_args, buf.buf);
-
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
-		goto diagnose_cleanup;
-
-	strvec_pushl(&archiver_args, "--prefix=",
-		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
-
-	/* `write_archive()` modifies the `argv` passed to it. Let it. */
-	argv_copy = xmemdupz(archiver_args.v,
-			     sizeof(char *) * archiver_args.nr);
-	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
-			    the_repository, NULL, 0);
-	if (res) {
-		error(_("failed to write archive"));
-		goto diagnose_cleanup;
-	}
 
-	if (!res)
-		fprintf(stderr, "\n"
-		       "Diagnostics complete.\n"
-		       "All of the gathered info is captured in '%s'\n",
-		       zip_path.buf);
+	res = create_diagnostics_archive(&zip_path);
 
 diagnose_cleanup:
-	if (archiver_fd >= 0) {
-		close(1);
-		dup2(stdout_fd, 1);
-	}
-	free(argv_copy);
-	strvec_clear(&archiver_args);
 	strbuf_release(&zip_path);
-	strbuf_release(&buf);
-
 	return res;
 }
 
diff --git a/diagnose.c b/diagnose.c
new file mode 100644
index 00000000000..509d582f0ea
--- /dev/null
+++ b/diagnose.c
@@ -0,0 +1,217 @@
+#include "cache.h"
+#include "diagnose.h"
+#include "compat/disk.h"
+#include "archive.h"
+#include "dir.h"
+#include "help.h"
+#include "strvec.h"
+#include "object-store.h"
+#include "packfile.h"
+
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
+static int add_directory_to_archiver(struct strvec *archiver_args,
+				     const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir;
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
+		return error_errno(_("could not open directory '%s'"), path);
+	}
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
+int create_diagnostics_archive(struct strbuf *zip_path)
+{
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	struct strbuf buf = STRBUF_INIT;
+	int res;
+
+	stdout_fd = dup(STDOUT_FILENO);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path->buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (dup2(archiver_fd, STDOUT_FILENO) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "git-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Repository root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+			"Diagnostics complete.\n"
+			"All of the gathered info is captured in '%s'\n",
+			zip_path->buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		dup2(stdout_fd, STDOUT_FILENO);
+		close(stdout_fd);
+		close(archiver_fd);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&buf);
+
+	return res;
+}
diff --git a/diagnose.h b/diagnose.h
new file mode 100644
index 00000000000..06dca69bdac
--- /dev/null
+++ b/diagnose.h
@@ -0,0 +1,8 @@
+#ifndef DIAGNOSE_H
+#define DIAGNOSE_H
+
+#include "strbuf.h"
+
+int create_diagnostics_archive(struct strbuf *zip_path);
+
+#endif /* DIAGNOSE_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 06/11] diagnose.c: add option to configure archive contents
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-11  0:16       ` Junio C Hamano
  2022-08-11 10:51       ` Ævar Arnfjörð Bjarmason
  2022-08-10 23:34     ` [PATCH v3 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
                       ` (5 subsequent siblings)
  11 siblings, 2 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'create_diagnostics_archive()' to take an argument 'mode'. When
archiving diagnostics for a repository, 'mode' is used to selectively
include/exclude information based on its value. The initial options for
'mode' are:

* DIAGNOSE_NONE: do not collect any diagnostics or create an archive
  (no-op).
* DIAGNOSE_STATS: collect basic repository metadata (Git version, repo path,
  filesystem available space) as well as sizing and count statistics for the
  repository's objects and packfiles.
* DIAGNOSE_ALL: collect basic repository metadata, sizing/count statistics,
  and copies of the '.git', '.git/hooks', '.git/info', '.git/logs', and
  '.git/objects/info' directories.

These modes are introduced to provide users the option to collect
diagnostics without the sensitive information included in copies of '.git'
dir contents. At the moment, only 'scalar diagnose' uses
'create_diagnostics_archive()' (with a hardcoded 'DIAGNOSE_ALL' mode to
match existing functionality), but more callers will be introduced in
subsequent patches.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c |  2 +-
 diagnose.c              | 19 +++++++++++++------
 diagnose.h              |  9 ++++++++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 3983def760a..d538b8b8f14 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -534,7 +534,7 @@ static int cmd_diagnose(int argc, const char **argv)
 		goto diagnose_cleanup;
 	}
 
-	res = create_diagnostics_archive(&zip_path);
+	res = create_diagnostics_archive(&zip_path, DIAGNOSE_ALL);
 
 diagnose_cleanup:
 	strbuf_release(&zip_path);
diff --git a/diagnose.c b/diagnose.c
index 509d582f0ea..aadc3d4b21f 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -132,7 +132,7 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
-int create_diagnostics_archive(struct strbuf *zip_path)
+int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode)
 {
 	struct strvec archiver_args = STRVEC_INIT;
 	char **argv_copy = NULL;
@@ -140,6 +140,11 @@ int create_diagnostics_archive(struct strbuf *zip_path)
 	struct strbuf buf = STRBUF_INIT;
 	int res;
 
+	if (mode == DIAGNOSE_NONE) {
+		res = 0;
+		goto diagnose_cleanup;
+	}
+
 	stdout_fd = dup(STDOUT_FILENO);
 	if (stdout_fd < 0) {
 		res = error_errno(_("could not duplicate stdout"));
@@ -177,11 +182,13 @@ int create_diagnostics_archive(struct strbuf *zip_path)
 	loose_objs_stats(&buf, ".git/objects");
 	strvec_push(&archiver_args, buf.buf);
 
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+	/* Only include this if explicitly requested */
+	if (mode == DIAGNOSE_ALL &&
+	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
 		goto diagnose_cleanup;
 
 	strvec_pushl(&archiver_args, "--prefix=",
diff --git a/diagnose.h b/diagnose.h
index 06dca69bdac..9bb6049bf0c 100644
--- a/diagnose.h
+++ b/diagnose.h
@@ -2,7 +2,14 @@
 #define DIAGNOSE_H
 
 #include "strbuf.h"
+#include "parse-options.h"
 
-int create_diagnostics_archive(struct strbuf *zip_path);
+enum diagnose_mode {
+	DIAGNOSE_NONE,
+	DIAGNOSE_STATS,
+	DIAGNOSE_ALL
+};
+
+int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode);
 
 #endif /* DIAGNOSE_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 07/11] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 08/11] builtin/diagnose.c: add '--mode' option Victoria Dye via GitGitGadget
                       ` (4 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a 'git diagnose' builtin to generate a standalone zip archive of
repository diagnostics.

The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
can be useful when diagnosing issues in any Git repository.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 .gitignore                     |  1 +
 Documentation/git-diagnose.txt | 50 +++++++++++++++++++++++++++++
 Makefile                       |  1 +
 builtin.h                      |  1 +
 builtin/diagnose.c             | 57 ++++++++++++++++++++++++++++++++++
 git.c                          |  1 +
 t/t0092-diagnose.sh            | 32 +++++++++++++++++++
 7 files changed, 143 insertions(+)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100755 t/t0092-diagnose.sh

diff --git a/.gitignore b/.gitignore
index 42fd7253b44..80b530bbed2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -53,6 +53,7 @@
 /git-cvsimport
 /git-cvsserver
 /git-daemon
+/git-diagnose
 /git-diff
 /git-diff-files
 /git-diff-index
diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
new file mode 100644
index 00000000000..ce07dd0725d
--- /dev/null
+++ b/Documentation/git-diagnose.txt
@@ -0,0 +1,50 @@
+git-diagnose(1)
+================
+
+NAME
+----
+git-diagnose - Generate a zip archive of diagnostic information
+
+SYNOPSIS
+--------
+[verse]
+'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+
+DESCRIPTION
+-----------
+Collects detailed information about the user's machine, Git client, and
+repository state and packages that information into a zip archive. The
+generated archive can then, for example, be shared with the Git mailing list to
+help debug an issue or serve as a reference for independent debugging.
+
+The following information is captured in the archive:
+
+  * 'git version --build-options'
+  * The path to the repository root
+  * The available disk space on the filesystem
+  * The name and size of each packfile, including those in alternate object
+    stores
+  * The total count of loose objects, as well as counts broken down by
+    `.git/objects` subdirectory
+
+This tool differs from linkgit:git-bugreport[1] in that it collects much more
+detailed information with a greater focus on reporting the size and data shape
+of repository contents.
+
+OPTIONS
+-------
+-o <path>::
+--output-directory <path>::
+	Place the resulting diagnostics archive in `<path>` instead of the
+	current directory.
+
+-s <format>::
+--suffix <format>::
+	Specify an alternate suffix for the diagnostics archive name, to create
+	a file named 'git-diagnostics-<formatted suffix>'. This should take the
+	form of a strftime(3) format string; the current local time will be
+	used.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index ed66cb70e5a..d34f680c065 100644
--- a/Makefile
+++ b/Makefile
@@ -1153,6 +1153,7 @@ BUILTIN_OBJS += builtin/credential-cache.o
 BUILTIN_OBJS += builtin/credential-store.o
 BUILTIN_OBJS += builtin/credential.o
 BUILTIN_OBJS += builtin/describe.o
+BUILTIN_OBJS += builtin/diagnose.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
 BUILTIN_OBJS += builtin/diff-tree.o
diff --git a/builtin.h b/builtin.h
index 40e9ecc8485..8901a34d6bf 100644
--- a/builtin.h
+++ b/builtin.h
@@ -144,6 +144,7 @@ int cmd_credential_cache(int argc, const char **argv, const char *prefix);
 int cmd_credential_cache_daemon(int argc, const char **argv, const char *prefix);
 int cmd_credential_store(int argc, const char **argv, const char *prefix);
 int cmd_describe(int argc, const char **argv, const char *prefix);
+int cmd_diagnose(int argc, const char **argv, const char *prefix);
 int cmd_diff_files(int argc, const char **argv, const char *prefix);
 int cmd_diff_index(int argc, const char **argv, const char *prefix);
 int cmd_diff(int argc, const char **argv, const char *prefix);
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
new file mode 100644
index 00000000000..832493bba65
--- /dev/null
+++ b/builtin/diagnose.c
@@ -0,0 +1,57 @@
+#include "builtin.h"
+#include "parse-options.h"
+#include "diagnose.h"
+
+static const char * const diagnose_usage[] = {
+	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
+	NULL
+};
+
+int cmd_diagnose(int argc, const char **argv, const char *prefix)
+{
+	struct strbuf zip_path = STRBUF_INIT;
+	time_t now = time(NULL);
+	struct tm tm;
+	char *option_output = NULL;
+	char *option_suffix = "%Y-%m-%d-%H%M";
+	char *prefixed_filename;
+
+	const struct option diagnose_options[] = {
+		OPT_STRING('o', "output-directory", &option_output, N_("path"),
+			   N_("specify a destination for the diagnostics archive")),
+		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
+			   N_("specify a strftime format suffix for the filename")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, diagnose_options,
+			     diagnose_usage, 0);
+
+	/* Prepare the path to put the result */
+	prefixed_filename = prefix_filename(prefix,
+					    option_output ? option_output : "");
+	strbuf_addstr(&zip_path, prefixed_filename);
+	strbuf_complete(&zip_path, '/');
+
+	strbuf_addstr(&zip_path, "git-diagnostics-");
+	strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_OK:
+	case SCLD_EXISTS:
+		break;
+	default:
+		die_errno(_("could not create leading directories for '%s'"),
+			  zip_path.buf);
+	}
+
+	/* Prepare diagnostics */
+	if (create_diagnostics_archive(&zip_path, DIAGNOSE_STATS))
+		die_errno(_("unable to create diagnostics archive %s"),
+			  zip_path.buf);
+
+	free(prefixed_filename);
+	strbuf_release(&zip_path);
+	return 0;
+}
diff --git a/git.c b/git.c
index e5d62fa5a92..0b9d8ef7677 100644
--- a/git.c
+++ b/git.c
@@ -522,6 +522,7 @@ static struct cmd_struct commands[] = {
 	{ "credential-cache--daemon", cmd_credential_cache_daemon },
 	{ "credential-store", cmd_credential_store },
 	{ "describe", cmd_describe, RUN_SETUP },
+	{ "diagnose", cmd_diagnose, RUN_SETUP_GENTLY },
 	{ "diff", cmd_diff, NO_PARSEOPT },
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
new file mode 100755
index 00000000000..b6923726fd7
--- /dev/null
+++ b/t/t0092-diagnose.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='git diagnose'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success UNZIP 'creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test >out &&
+	grep "Available space" out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out &&
+
+	# Should not include .git directory contents by default
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+	grep "^Total: [0-9][0-9]*" out
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 08/11] builtin/diagnose.c: add '--mode' option
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create '--mode=<mode>' option in 'git diagnose' to allow users to optionally
select non-default diagnostic information to include in the output archive.
Additionally, document the currently-available modes, emphasizing the
importance of not sharing a '--mode=all' archive publicly due to the
presence of sensitive information.

Note that the option parsing callback - 'option_parse_diagnose()' - is added
to 'diagnose.c' rather than 'builtin/diagnose.c' so that it may be reused in
future callers configuring a diagnostics archive.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-diagnose.txt | 17 ++++++++++++++++-
 builtin/diagnose.c             |  8 ++++++--
 diagnose.c                     | 30 ++++++++++++++++++++++++++++++
 diagnose.h                     |  2 ++
 t/t0092-diagnose.sh            | 30 +++++++++++++++++++++++++++++-
 5 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
index ce07dd0725d..3ec8cc7ad72 100644
--- a/Documentation/git-diagnose.txt
+++ b/Documentation/git-diagnose.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+	       [--mode=<mode>]
 
 DESCRIPTION
 -----------
@@ -17,7 +18,7 @@ repository state and packages that information into a zip archive. The
 generated archive can then, for example, be shared with the Git mailing list to
 help debug an issue or serve as a reference for independent debugging.
 
-The following information is captured in the archive:
+By default, the following information is captured in the archive:
 
   * 'git version --build-options'
   * The path to the repository root
@@ -27,6 +28,9 @@ The following information is captured in the archive:
   * The total count of loose objects, as well as counts broken down by
     `.git/objects` subdirectory
 
+Additional information can be collected by selecting a different diagnostic mode
+using the `--mode` option.
+
 This tool differs from linkgit:git-bugreport[1] in that it collects much more
 detailed information with a greater focus on reporting the size and data shape
 of repository contents.
@@ -45,6 +49,17 @@ OPTIONS
 	form of a strftime(3) format string; the current local time will be
 	used.
 
+--mode=(stats|all)::
+	Specify the type of diagnostics that should be collected. The default behavior
+	of 'git diagnose' is equivalent to `--mode=stats`.
++
+The `--mode=all` option collects everything included in `--mode=stats`, as well
+as copies of `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
+`.git/objects/info` directories. This additional information may be sensitive,
+as it can be used to reconstruct the full contents of the diagnosed repository.
+Users should exercise caution when sharing an archive generated with
+`--mode=all`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
index 832493bba65..cd260c20155 100644
--- a/builtin/diagnose.c
+++ b/builtin/diagnose.c
@@ -3,7 +3,7 @@
 #include "diagnose.h"
 
 static const char * const diagnose_usage[] = {
-	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
+	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>] [--mode=<mode>]"),
 	NULL
 };
 
@@ -12,6 +12,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 	struct strbuf zip_path = STRBUF_INIT;
 	time_t now = time(NULL);
 	struct tm tm;
+	enum diagnose_mode mode = DIAGNOSE_STATS;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	char *prefixed_filename;
@@ -21,6 +22,9 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 			   N_("specify a destination for the diagnostics archive")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
 			   N_("specify a strftime format suffix for the filename")),
+		OPT_CALLBACK_F(0, "mode", &mode, N_("(stats|all)"),
+			       N_("specify the content of the diagnostic archive"),
+			       PARSE_OPT_NONEG, option_parse_diagnose),
 		OPT_END()
 	};
 
@@ -47,7 +51,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 	}
 
 	/* Prepare diagnostics */
-	if (create_diagnostics_archive(&zip_path, DIAGNOSE_STATS))
+	if (create_diagnostics_archive(&zip_path, mode))
 		die_errno(_("unable to create diagnostics archive %s"),
 			  zip_path.buf);
 
diff --git a/diagnose.c b/diagnose.c
index aadc3d4b21f..00c3e9438e2 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -8,6 +8,36 @@
 #include "object-store.h"
 #include "packfile.h"
 
+struct diagnose_option {
+	enum diagnose_mode mode;
+	const char *option_name;
+};
+
+static struct diagnose_option diagnose_options[] = {
+	{ DIAGNOSE_STATS, "stats" },
+	{ DIAGNOSE_ALL, "all" },
+};
+
+int option_parse_diagnose(const struct option *opt, const char *arg, int unset)
+{
+	int i;
+	enum diagnose_mode *diagnose = opt->value;
+
+	if (!arg) {
+		*diagnose = unset ? DIAGNOSE_NONE : DIAGNOSE_STATS;
+		return 0;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(diagnose_options); i++) {
+		if (!strcmp(arg, diagnose_options[i].option_name)) {
+			*diagnose = diagnose_options[i].mode;
+			return 0;
+		}
+	}
+
+	return error(_("invalid --%s value '%s'"), opt->long_name, arg);
+}
+
 static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
 				   const char *file_name, void *data)
 {
diff --git a/diagnose.h b/diagnose.h
index 9bb6049bf0c..7a4951a7863 100644
--- a/diagnose.h
+++ b/diagnose.h
@@ -10,6 +10,8 @@ enum diagnose_mode {
 	DIAGNOSE_ALL
 };
 
+int option_parse_diagnose(const struct option *opt, const char *arg, int unset);
+
 int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode);
 
 #endif /* DIAGNOSE_H */
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
index b6923726fd7..fca9b58489c 100755
--- a/t/t0092-diagnose.sh
+++ b/t/t0092-diagnose.sh
@@ -26,7 +26,35 @@ test_expect_success UNZIP 'creates diagnostics zip archive' '
 
 	# Should not include .git directory contents by default
 	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
-	grep "^Total: [0-9][0-9]*" out
+'
+
+test_expect_success UNZIP '--mode=stats excludes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test --mode=stats >out &&
+
+	# Includes pack quantity/size info
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	# Does not include .git directory contents
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--mode=all includes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test --mode=all >out &&
+
+	# Includes pack quantity/size info
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	# Includes .git directory contents
+	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
+
+	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
+	test_file_not_empty out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 08/11] builtin/diagnose.c: add '--mode' option Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-11 10:53       ` Ævar Arnfjörð Bjarmason
  2022-08-10 23:34     ` [PATCH v3 10/11] scalar-diagnose: use 'git diagnose --mode=all' Victoria Dye via GitGitGadget
                       ` (2 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a '--diagnose' option for 'git bugreport' to collect additional
information about the repository and write it to a zipped archive.

The '--diagnose' option behaves effectively as an alias for simultaneously
running 'git bugreport' and 'git diagnose'. In the documentation, users are
explicitly recommended to attach the diagnostics alongside a bug report to
provide additional context to readers, ideally reducing some back-and-forth
between reporters and those debugging the issue.

Note that '--diagnose' may take an optional string arg (either 'stats' or
'all'). If specified without the arg, the behavior corresponds to running
'git diagnose' without '--mode'. As with 'git diagnose', this default is
intended to help reduce unintentional leaking of sensitive information).
Users can also explicitly specify '--diagnose=(stats|all)' to generate the
respective archive created by 'git diagnose --mode=(stats|all)'.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-bugreport.txt | 18 +++++++++++++
 builtin/bugreport.c             | 27 ++++++++++++++++---
 t/t0091-bugreport.sh            | 48 +++++++++++++++++++++++++++++++++
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
index d8817bf3cec..eca726e5791 100644
--- a/Documentation/git-bugreport.txt
+++ b/Documentation/git-bugreport.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+		[--diagnose[=<mode>]]
 
 DESCRIPTION
 -----------
@@ -31,6 +32,10 @@ The following information is captured automatically:
  - A list of enabled hooks
  - $SHELL
 
+Additional information may be gathered into a separate zip archive using the
+`--diagnose` option, and can be attached alongside the bugreport document to
+provide additional context to readers.
+
 This tool is invoked via the typical Git setup process, which means that in some
 cases, it might not be able to launch - for example, if a relevant config file
 is unreadable. In this kind of scenario, it may be helpful to manually gather
@@ -49,6 +54,19 @@ OPTIONS
 	named 'git-bugreport-<formatted suffix>'. This should take the form of a
 	strftime(3) format string; the current local time will be used.
 
+--no-diagnose::
+--diagnose[=<mode>]::
+	Create a zip archive of supplemental information about the user's
+	machine, Git client, and repository state. The archive is written to the
+	same output directory as the bug report and is named
+	'git-diagnostics-<formatted suffix>'.
++
+Without `mode` specified, the diagnostic archive will contain the default set of
+statistics reported by `git diagnose`. An optional `mode` value may be specified
+to change which information is included in the archive. See
+linkgit:git-diagnose[1] for the list of valid values for `mode` and details
+about their usage.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 9de32bc96e7..530895be55f 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -5,6 +5,7 @@
 #include "compat/compiler.h"
 #include "hook.h"
 #include "hook-list.h"
+#include "diagnose.h"
 
 
 static void get_system_info(struct strbuf *sys_info)
@@ -59,7 +60,7 @@ static void get_populated_hooks(struct strbuf *hook_info, int nongit)
 }
 
 static const char * const bugreport_usage[] = {
-	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
+	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>] [--diagnose[=<mode>]"),
 	NULL
 };
 
@@ -98,16 +99,21 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	int report = -1;
 	time_t now = time(NULL);
 	struct tm tm;
+	enum diagnose_mode diagnose = DIAGNOSE_NONE;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	const char *user_relative_path = NULL;
 	char *prefixed_filename;
+	size_t output_path_len;
 
 	const struct option bugreport_options[] = {
+		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("mode"),
+			       N_("create an additional zip archive of detailed diagnostics (default 'stats')"),
+			       PARSE_OPT_OPTARG, option_parse_diagnose),
 		OPT_STRING('o', "output-directory", &option_output, N_("path"),
-			   N_("specify a destination for the bugreport file")),
+			   N_("specify a destination for the bugreport file(s)")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
-			   N_("specify a strftime format suffix for the filename")),
+			   N_("specify a strftime format suffix for the filename(s)")),
 		OPT_END()
 	};
 
@@ -119,6 +125,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 					    option_output ? option_output : "");
 	strbuf_addstr(&report_path, prefixed_filename);
 	strbuf_complete(&report_path, '/');
+	output_path_len = report_path.len;
 
 	strbuf_addstr(&report_path, "git-bugreport-");
 	strbuf_addftime(&report_path, option_suffix, localtime_r(&now, &tm), 0, 0);
@@ -133,6 +140,20 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 		    report_path.buf);
 	}
 
+	/* Prepare diagnostics, if requested */
+	if (diagnose != DIAGNOSE_NONE) {
+		struct strbuf zip_path = STRBUF_INIT;
+		strbuf_add(&zip_path, report_path.buf, output_path_len);
+		strbuf_addstr(&zip_path, "git-diagnostics-");
+		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+		strbuf_addstr(&zip_path, ".zip");
+
+		if (create_diagnostics_archive(&zip_path, diagnose))
+			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
+
+		strbuf_release(&zip_path);
+	}
+
 	/* Prepare the report contents */
 	get_bug_template(&buffer);
 
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
index 08f5fe9caef..b6d2f591acd 100755
--- a/t/t0091-bugreport.sh
+++ b/t/t0091-bugreport.sh
@@ -78,4 +78,52 @@ test_expect_success 'indicates populated hooks' '
 	test_cmp expect actual
 '
 
+test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose -o report -s test >out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	grep "Available space" out &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out &&
+
+	# Should not include .git directory contents by default
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--diagnose=stats excludes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose=stats -o report -s test >out &&
+
+	# Includes pack quantity/size info
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	# Does not include .git directory contents
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--diagnose=all includes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose=all -o report -s test >out &&
+
+	# Includes .git directory contents
+	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
+
+	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 10/11] scalar-diagnose: use 'git diagnose --mode=all'
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (8 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-10 23:34     ` [PATCH v3 11/11] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Replace implementation of 'scalar diagnose' with an internal invocation of
'git diagnose --mode=all'. This simplifies the implementation of
'cmd_diagnose' by making it a direct alias of 'git diagnose' and removes
some code in 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The
simplicity of the alias also sets up a clean deprecation path for 'scalar
diagnose' (in favor of 'git diagnose'), if that is desired in the future.

This introduces one minor change to the output of 'scalar diagnose', which
is that the prefix of the created zip archive is changed from 'scalar_' to
'git-diagnostics-'.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index d538b8b8f14..68571ce195f 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,7 +11,6 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "diagnose.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -510,34 +509,19 @@ static int cmd_diagnose(int argc, const char **argv)
 		N_("scalar diagnose [<enlistment>]"),
 		NULL
 	};
-	struct strbuf zip_path = STRBUF_INIT;
-	time_t now = time(NULL);
-	struct tm tm;
+	struct strbuf diagnostics_root = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
 			     usage, 0);
 
-	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
-
-	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
-	strbuf_addftime(&zip_path,
-			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
-	strbuf_addstr(&zip_path, ".zip");
-	switch (safe_create_leading_directories(zip_path.buf)) {
-	case SCLD_EXISTS:
-	case SCLD_OK:
-		break;
-	default:
-		error_errno(_("could not create directory for '%s'"),
-			    zip_path.buf);
-		goto diagnose_cleanup;
-	}
+	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
+	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
 
-	res = create_diagnostics_archive(&zip_path, DIAGNOSE_ALL);
+	res = run_git("diagnose", "--mode=all", "-s", "%Y%m%d_%H%M%S",
+		      "-o", diagnostics_root.buf, NULL);
 
-diagnose_cleanup:
-	strbuf_release(&zip_path);
+	strbuf_release(&diagnostics_root);
 	return res;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 11/11] scalar: update technical doc roadmap
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (9 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 10/11] scalar-diagnose: use 'git diagnose --mode=all' Victoria Dye via GitGitGadget
@ 2022-08-10 23:34     ` Victoria Dye via GitGitGadget
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
  11 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-10 23:34 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update the Scalar roadmap to reflect the completion of generalizing 'scalar
diagnose' into 'git diagnose' and 'git bugreport --diagnose'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/technical/scalar.txt | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
index 08bc09c225a..f6353375f08 100644
--- a/Documentation/technical/scalar.txt
+++ b/Documentation/technical/scalar.txt
@@ -84,6 +84,9 @@ series have been accepted:
 
 - `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
 
+- `scalar-generalize-diagnose`: Move the functionality of `scalar diagnose`
+  into `git diagnose` and `git bugreport --diagnose`.
+
 Roughly speaking (and subject to change), the following series are needed to
 "finish" this initial version of Scalar:
 
@@ -91,12 +94,6 @@ Roughly speaking (and subject to change), the following series are needed to
   and implement `scalar help`. At the end of this series, Scalar should be
   feature-complete from the perspective of a user.
 
-- Generalize features not specific to Scalar: In the spirit of making Scalar
-  configure only what is needed for large repo performance, move common
-  utilities into other parts of Git. Some of this will be internal-only, but one
-  major change will be generalizing `scalar diagnose` for use with any Git
-  repository.
-
 - Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
   `git`, including updates to build and install it with the rest of Git. This
   change will incorporate Scalar into the Git CI and test framework, as well as
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/11] diagnose.c: add option to configure archive contents
  2022-08-10 23:34     ` [PATCH v3 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
@ 2022-08-11  0:16       ` Junio C Hamano
  2022-08-12 17:00         ` Victoria Dye
  2022-08-11 10:51       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-11  0:16 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -177,11 +182,13 @@ int create_diagnostics_archive(struct strbuf *zip_path)
>  	loose_objs_stats(&buf, ".git/objects");
>  	strvec_push(&archiver_args, buf.buf);
>  
> -	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +	/* Only include this if explicitly requested */
> +	if (mode == DIAGNOSE_ALL &&
> +	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
>  		goto diagnose_cleanup;

At first glance, it looks as if this part fails silently, but
add_directory_to_archiver() states what failed there, so we show
necessary error messages and do not silently fail, which is good.

There is a "failed to write archive" message after write_archive()
call returns non-zero, but presumably write_archive() itself gives
diagnostics (like "oh, I was told to archive this file but I cannot
read it") when it does so, so in a sense, giving the concluding
"failed to write" only in that case might make the error messages
uneven.  If we fail to enlist ".git/hooks" directory, we may want to
say why we failed to do so, and then want to see the concluding
"failed to write" at the end, just like the case where write_archive()
failed.

It is a truely minor point, and if it turns out to be worth fixing,
it can be easily done by moving the diagnose_clean_up label a bit
higher, i.e.

	...
	res = write_archive(...);

diagnose_cleanup:
	if (res)
		error(_("failed to write archive"));
	else
        	fprintf(stderr, "\n"
			"Diagnostics complete.\n"
			"All of the gathered info is captured in '%s'\n",
			zip_path->buf);

	if (archiver_fd >= 0) {
		... restore FD#1 and close stdout_fd and archiver_fd
	}
	...


Other than that, this new patch looks good to me.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/11] diagnose.c: add option to configure archive contents
  2022-08-10 23:34     ` [PATCH v3 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
  2022-08-11  0:16       ` Junio C Hamano
@ 2022-08-11 10:51       ` Ævar Arnfjörð Bjarmason
  2022-08-11 15:43         ` Victoria Dye
  1 sibling, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-11 10:51 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Wed, Aug 10 2022, Victoria Dye via GitGitGadget wrote:

> index 06dca69bdac..9bb6049bf0c 100644
> --- a/diagnose.h
> +++ b/diagnose.h
> @@ -2,7 +2,14 @@
>  #define DIAGNOSE_H
>  
>  #include "strbuf.h"
> +#include "parse-options.h"

This is a stray include that isn't needed at this point, some mistake,
or needed by a subsequent patch?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option
  2022-08-10 23:34     ` [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-11 10:53       ` Ævar Arnfjörð Bjarmason
  2022-08-11 15:40         ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-11 10:53 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Wed, Aug 10 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Create a '--diagnose' option for 'git bugreport' to collect additional
> information about the repository and write it to a zipped archive.
> [...]
>  'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
> +		[--diagnose[=<mode>]]
> [...]
>  static const char * const bugreport_usage[] = {
> -	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
> +	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>] [--diagnose[=<mode>]"),
>  	NULL
>  };

This still has the SYNOPSIS v.s. -h discrepancy noted in
https://lore.kernel.org/git/220804.86v8r8ec4s.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-05 19:38       ` Derrick Stolee
@ 2022-08-11 11:06         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-11 11:06 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Victoria Dye via GitGitGadget, git, johannes.schindelin, Victoria Dye


On Fri, Aug 05 2022, Derrick Stolee wrote:

> On 8/4/2022 2:27 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Create a 'git diagnose' builtin to generate a standalone zip archive of
>>> repository diagnostics.
>> 
>> It's good to have this as a built-in separate from "git bugreport",
>> but...
>> 
>>> +git-diagnose - Generate a zip archive of diagnostic information
>> 
>> ...I'd really prefer for this not to squat on such a common name we
>> might regret having reserved later for such very specific
>> functionality. I'd think e.g. these would be better:
>> 
>> 	git mk-diagnostics-zip
>> 
>> Or maybe:
>> 
>> 	git archive-interesting-for-report
>
> These are not realistic replacements.

Maybe:

	git diagnose create-zip

?

>> If I had to guess what a "git diagnose" did, I'd probably think:
>> 
>>  * It analyzes your config, and suggests redundancies/alternatives
>>  * It does some perf tests / heuritics, and e.g. suggests you turn on
>>    the commit-graph writing.
>
> These sound like great options to add in the future, such as:
>
>    --perf-test: Run performance tests on your repository using different
>    Git config options and recommend certain settings.
>
> (This --perf-test option would be a great way to get wider adoption
> of parallel checkout, since its optimal settings are so machine
> dependent.)

...

> The thing is, even if we did these other things, it would result in
> some kind of document that summarizes the repository shape and features.
> That kind of data is exactly what this version of 'git diagnose' does.

I think a command like "git diagnose" that had options to do other
unrelated stuff, but by default created a zip archive when given no
options would be rather confusing.

Yes, it makes sense to emit some human-readable summary, but to zip it
up as well? That's something we just need for the "git bugreport"
case...

> For now, it leaves the human reader responsible for making decisions
> based on those documents, but they have been incredibly helpful when we
> are _diagnosing_ issues users are having with their repositories.

This is orthagonal to what I'm pointing out. You're basically saying the
user can just read the documentation to find out what this built-in
does.

That's true, what I'm pointing out is that it's unfortunate that such
highly specific functionality is squatting on such a short & generic
name, but just e.g. adding a "create-zip" sub-command would address
that.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option
  2022-08-11 10:53       ` Ævar Arnfjörð Bjarmason
@ 2022-08-11 15:40         ` Victoria Dye
  2022-08-11 20:30           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-11 15:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Aug 10 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Create a '--diagnose' option for 'git bugreport' to collect additional
>> information about the repository and write it to a zipped archive.
>> [...]
>>  'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
>> +		[--diagnose[=<mode>]]
>> [...]
>>  static const char * const bugreport_usage[] = {
>> -	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
>> +	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>] [--diagnose[=<mode>]"),
>>  	NULL
>>  };
> 
> This still has the SYNOPSIS v.s. -h discrepancy noted in
> https://lore.kernel.org/git/220804.86v8r8ec4s.gmgdl@evledraar.gmail.com/

The discrepancy you pointed out was on 'git diagnose' (which has since been
fixed), this is a pre-existing one in 'git bugreport'. I decided against
fixing *this* one because it didn't really fit into any of the patches in
this series, so it would need its own patch. When balancing "leave things
better than you found them" vs. "stay focused on the purpose of the series",
I leaned towards the latter to avoid setting a precedent for other 'git
bugreport'-related scope creep.

If you have the patches to audit this sort of thing, I think a nice place to
fix this might be in a dedicated series fixing discrepancies tree-wide. Even
better, you could include the patches in your tree that detect them as part
of e.g. the 'static-analysis' CI job.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/11] diagnose.c: add option to configure archive contents
  2022-08-11 10:51       ` Ævar Arnfjörð Bjarmason
@ 2022-08-11 15:43         ` Victoria Dye
  0 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye @ 2022-08-11 15:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin

Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Aug 10 2022, Victoria Dye via GitGitGadget wrote:
> 
>> index 06dca69bdac..9bb6049bf0c 100644
>> --- a/diagnose.h
>> +++ b/diagnose.h
>> @@ -2,7 +2,14 @@
>>  #define DIAGNOSE_H
>>  
>>  #include "strbuf.h"
>> +#include "parse-options.h"
> 
> This is a stray include that isn't needed at this point, some mistake,
> or needed by a subsequent patch?

It's needed by patch 8 [1]. If I re-roll again, I'll move this '#include' to
that patch. Thanks!

[1] https://lore.kernel.org/git/3da0cb725c927d08dd9486286e06bdb76896f5b7.1660174473.git.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option
  2022-08-11 15:40         ` Victoria Dye
@ 2022-08-11 20:30           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-11 20:30 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee, johannes.schindelin


On Thu, Aug 11 2022, Victoria Dye wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Aug 10 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Create a '--diagnose' option for 'git bugreport' to collect additional
>>> information about the repository and write it to a zipped archive.
>>> [...]
>>>  'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
>>> +		[--diagnose[=<mode>]]
>>> [...]
>>>  static const char * const bugreport_usage[] = {
>>> -	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
>>> +	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>] [--diagnose[=<mode>]"),
>>>  	NULL
>>>  };
>> 
>> This still has the SYNOPSIS v.s. -h discrepancy noted in
>> https://lore.kernel.org/git/220804.86v8r8ec4s.gmgdl@evledraar.gmail.com/
>
> The discrepancy you pointed out was on 'git diagnose' (which has since been
> fixed),

Ah, sorry. I missed that & conflated the two.

> this is a pre-existing one in 'git bugreport'. I decided against
> fixing *this* one because it didn't really fit into any of the patches in
> this series, so it would need its own patch. When balancing "leave things
> better than you found them" vs. "stay focused on the purpose of the series",
> I leaned towards the latter to avoid setting a precedent for other 'git
> bugreport'-related scope creep.

In any case, I'm pointing out the difference in one of them having
\n-wrapping inconsistent with the other, which is an addition in this
series, sorry about not being clear.

I see that there's also the difference in how they format "--suffix",
but that's pre-existing & we can leave it for now. I think that's what
you're pointing out here as pre-existing.

> If you have the patches to audit this sort of thing, I think a nice place to
> fix this might be in a dedicated series fixing discrepancies tree-wide. Even
> better, you could include the patches in your tree that detect them as part
> of e.g. the 'static-analysis' CI job.

Yeah I do have those, and will probably submit those sooner than later,
and I'll end up spotting differences once they land on "master"
(e.g. [1] is one such case).

But this one is just one I eyeballed during review.

1. https://lore.kernel.org/git/220811.86o7wrov26.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/11] diagnose.c: add option to configure archive contents
  2022-08-11  0:16       ` Junio C Hamano
@ 2022-08-12 17:00         ` Victoria Dye
  0 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye @ 2022-08-12 17:00 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> @@ -177,11 +182,13 @@ int create_diagnostics_archive(struct strbuf *zip_path)
>>  	loose_objs_stats(&buf, ".git/objects");
>>  	strvec_push(&archiver_args, buf.buf);
>>  
>> -	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>> -	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>> -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
>> -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
>> -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
>> +	/* Only include this if explicitly requested */
>> +	if (mode == DIAGNOSE_ALL &&
>> +	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>> +	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>> +	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
>> +	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
>> +	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
>>  		goto diagnose_cleanup;
> 
> At first glance, it looks as if this part fails silently, but
> add_directory_to_archiver() states what failed there, so we show
> necessary error messages and do not silently fail, which is good.
> 
> There is a "failed to write archive" message after write_archive()
> call returns non-zero, but presumably write_archive() itself gives
> diagnostics (like "oh, I was told to archive this file but I cannot
> read it") when it does so, so in a sense, giving the concluding
> "failed to write" only in that case might make the error messages
> uneven.  If we fail to enlist ".git/hooks" directory, we may want to
> say why we failed to do so, and then want to see the concluding
> "failed to write" at the end, just like the case where write_archive()
> failed.
> 
> It is a truely minor point, and if it turns out to be worth fixing,
> it can be easily done by moving the diagnose_clean_up label a bit
> higher, i.e.
> 
> 	...
> 	res = write_archive(...);
> 
> diagnose_cleanup:
> 	if (res)
> 		error(_("failed to write archive"));
> 	else
>         	fprintf(stderr, "\n"
> 			"Diagnostics complete.\n"
> 			"All of the gathered info is captured in '%s'\n",
> 			zip_path->buf);
> 
> 	if (archiver_fd >= 0) {
> 		... restore FD#1 and close stdout_fd and archiver_fd
> 	}
> 	...
>

I like this idea, since I think there's value in indicating both the cause
("could not open directory") and effect ("failed to write archive") of the
error. I'll include this and [1] in a re-roll. Thanks!

[1] https://lore.kernel.org/git/9d1b0cb9-5c21-c101-8597-2fe166cb6abe@github.com/

> 
> Other than that, this new patch looks good to me.
> 
> Thanks.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'
  2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
                       ` (10 preceding siblings ...)
  2022-08-10 23:34     ` [PATCH v3 11/11] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
@ 2022-08-12 20:10     ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
                         ` (10 more replies)
  11 siblings, 11 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

As part of the preparation for moving Scalar out of 'contrib/' and into Git,
this series moves the functionality of 'scalar diagnose' into a new builtin
('git diagnose') and a new option ('--diagnose') for 'git bugreport'. This
change further aligns Scalar with the objective [1] of having it only
contain functionality and settings that benefit large Git repositories, but
not all repositories. The diagnostics reported by 'scalar diagnose' relevant
for investigating issues in any Git repository, so generating them should be
part of a "normal" Git builtin.

The series is organized as follows:

 * Miscellaneous fixes for the existing 'scalar diagnose' implementation
 * Moving the code for generating diagnostics into a common location in the
   Git tree
 * Implementing 'git diagnose'
 * Implementing 'git bugreport --diagnose'
 * Updating the Scalar roadmap

Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
'git bugreport --diagnose', it is not being deprecated in this series.
Although deprecation -> removal could be a future cleanup effort, 'scalar
diagnose' is kept around for now as an alias for users already accustomed to
using it in 'scalar'.


Changes since V3
================

 * Moved 'parse-options.h' import in 'diagnose.h' into the commit where it's
   first needed.
 * Removed a redundant 'if (!res)' condition gating the final "Diagnostics
   complete" message in 'create_diagnostics_archive()'.
 * Improved error reporting in 'create_diagnostics_archive()'. I was
   originally going to modify the "failed to write archive" error to trigger
   whenever 'create_diagnostics_archive()' returned a nonzero value [2].
   However, while working on it I realized the message would no longer be
   tied to a failure of 'write_archive()', making it less helpful in
   pinpointing an issue. To address the original issue
   ('add_directory_to_archiver()' silently failing in
   'create_diagnostics_archive()'), I instead refactored those calls into a
   loop and added the error message. Now, there's exactly one error message
   printed for each possible early exit scenario from
   'create_diagnostics_archive()', hopefully avoiding both redundancy &
   under-reporting.


Changes since V2
================

 * Replaced 'int include_everything' arg to 'create_diagnostic_archive()'
   with 'enum diagnose_mode mode'.
 * Replaced '--all' with configurable '--mode' option in 'git diagnose';
   moved 'option_parse_diagnose()' into 'diagnose.c' so that it can be used
   for both 'git bugreport --diagnose' and 'git diagnose --mode'.
 * Split "builtin/diagnose.c: gate certain data behind '--all'" (formerly
   patch 7/10) into "diagnose.c: add option to configure archive contents"
   (patch 6/11) and "builtin/diagnose.c: add '--mode' option" (patch 8/11).
 * Added '--no-diagnose' for 'git bugreport'. I was initially going to use
   '--diagnose=none', but '--no-diagnose' was easier to configure when using
   the shared 'option_parse_diagnose()' function .
 * Updated usage strings, option descriptions, and documentation files for
   'mode' option. To avoid needing to keep multiple lists of valid 'mode'
   values up-to-date, format mode value as <mode> everywhere except option
   description in 'git-diagnose.txt', where the values are listed. The
   documentation of '--diagnose' in 'git-bugreport.txt' links to
   'git-diagnose.txt' and explicitly calls out that details on 'mode' can be
   found there.
 * Reworded 'git diagnose' and 'git bugreport' command & option
   documentation.
 * Added additional checks to 't0091-bugreport.sh' and 't0092-diagnose.sh'
   tests
 * Moved '#include "cache.h" from 'diagnose.h' to 'diagnose.c'.
 * Fixed '--output-directory' usage string in 'builtin/diagnose.c'.
 * Replaced 'die()' with 'die_errno()' in error triggered when leading
   directories of archive cannot be created.
 * Changed hardcoded '-1' error exit code in 'scalar diagnose' to returning
   the exit code from 'git diagnose --mode=all'.


Changes since V1
================

 * Reorganized patches to fix minor issues (e.g., more gently adding
   directories to the archive) of 'scalar diagnose' in 'scalar.c', before
   the code is moved out of that file.
 * (Almost) entirely redesigned the UI for generating diagnostics. The new
   approach avoids cluttering 'git bugreport' with a mode that doesn't
   actually generate a report. Now, there are distinct options for different
   use cases: generating extra diagnostics with a bug report ('git bugreport
   --diagnose') and generating diagnostics for personal debugging/addition
   to an existing bug report ('git diagnose').
 * Moved 'get_disk_info()' into 'compat/'.
 * Moved 'create_diagnostics_archive()' into a new 'diagnose.c', as it now
   has multiple callers.
 * Updated command & option documentation to more clearly guide users on how
   to use the new options.
 * Added the '--all' (and '--diagnose=all') option to change the default
   behavior of diagnostics generation to exclude '.git' directory contents.
   For many bug reporters, this would reveal private repository contents
   they don't want to expose to the public mailing list. This has the added
   benefit of creating much smaller archives by default, which will be more
   likely to successfully send to the mailing list.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/pull.1275.v2.git.1657584367.gitgitgadget@gmail.com/
[2]
https://lore.kernel.org/git/32f2cadc-556e-1cd5-a238-c8f1cdaaf470@github.com/

Victoria Dye (11):
  scalar-diagnose: use "$GIT_UNZIP" in test
  scalar-diagnose: avoid 32-bit overflow of size_t
  scalar-diagnose: add directory to archiver more gently
  scalar-diagnose: move 'get_disk_info()' to 'compat/'
  scalar-diagnose: move functionality to common location
  diagnose.c: add option to configure archive contents
  builtin/diagnose.c: create 'git diagnose' builtin
  builtin/diagnose.c: add '--mode' option
  builtin/bugreport.c: create '--diagnose' option
  scalar-diagnose: use 'git diagnose --mode=all'
  scalar: update technical doc roadmap

 .gitignore                         |   1 +
 Documentation/git-bugreport.txt    |  18 ++
 Documentation/git-diagnose.txt     |  65 +++++++
 Documentation/technical/scalar.txt |   9 +-
 Makefile                           |   2 +
 builtin.h                          |   1 +
 builtin/bugreport.c                |  27 ++-
 builtin/diagnose.c                 |  61 +++++++
 compat/disk.h                      |  56 ++++++
 contrib/scalar/scalar.c            | 271 +----------------------------
 contrib/scalar/t/t9099-scalar.sh   |   8 +-
 diagnose.c                         | 269 ++++++++++++++++++++++++++++
 diagnose.h                         |  17 ++
 git-compat-util.h                  |   1 +
 git.c                              |   1 +
 t/t0091-bugreport.sh               |  48 +++++
 t/t0092-diagnose.sh                |  60 +++++++
 17 files changed, 637 insertions(+), 278 deletions(-)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100644 compat/disk.h
 create mode 100644 diagnose.c
 create mode 100644 diagnose.h
 create mode 100755 t/t0092-diagnose.sh


base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1310%2Fvdye%2Fscalar%2Fgeneralize-diagnose-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1310/vdye/scalar/generalize-diagnose-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1310

Range-diff vs v3:

  1:  f5ceb9c7190 =  1:  f5ceb9c7190 scalar-diagnose: use "$GIT_UNZIP" in test
  2:  78a93eb95bb =  2:  78a93eb95bb scalar-diagnose: avoid 32-bit overflow of size_t
  3:  22ee8ea5a1e =  3:  22ee8ea5a1e scalar-diagnose: add directory to archiver more gently
  4:  18f2ba4e0cd =  4:  18f2ba4e0cd scalar-diagnose: move 'get_disk_info()' to 'compat/'
  5:  7a51fad87a8 !  5:  c19f3632d4f scalar-diagnose: move functionality to common location
     @@ diagnose.c (new)
      +		goto diagnose_cleanup;
      +	}
      +
     -+	if (!res)
     -+		fprintf(stderr, "\n"
     -+			"Diagnostics complete.\n"
     -+			"All of the gathered info is captured in '%s'\n",
     -+			zip_path->buf);
     ++	fprintf(stderr, "\n"
     ++		"Diagnostics complete.\n"
     ++		"All of the gathered info is captured in '%s'\n",
     ++		zip_path->buf);
      +
      +diagnose_cleanup:
      +	if (archiver_fd >= 0) {
  6:  0a6c55696d8 !  6:  710b67e5776 diagnose.c: add option to configure archive contents
     @@ Commit message
          match existing functionality), but more callers will be introduced in
          subsequent patches.
      
     +    Finally, refactor from a hardcoded set of 'add_directory_to_archiver()'
     +    calls to iterative invocations gated by 'DIAGNOSE_ALL'. This allows for
     +    easier future modification of the set of directories to archive and improves
     +    error reporting when 'add_directory_to_archiver()' fails.
     +
          Helped-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       	strbuf_release(&zip_path);
      
       ## diagnose.c ##
     +@@
     + #include "object-store.h"
     + #include "packfile.h"
     + 
     ++struct archive_dir {
     ++	const char *path;
     ++	int recursive;
     ++};
     ++
     + static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
     + 				   const char *file_name, void *data)
     + {
      @@ diagnose.c: static int add_directory_to_archiver(struct strvec *archiver_args,
       	return res;
       }
     @@ diagnose.c: static int add_directory_to_archiver(struct strvec *archiver_args,
       {
       	struct strvec archiver_args = STRVEC_INIT;
       	char **argv_copy = NULL;
     -@@ diagnose.c: int create_diagnostics_archive(struct strbuf *zip_path)
     + 	int stdout_fd = -1, archiver_fd = -1;
       	struct strbuf buf = STRBUF_INIT;
     - 	int res;
     - 
     +-	int res;
     ++	int res, i;
     ++	struct archive_dir archive_dirs[] = {
     ++		{ ".git", 0 },
     ++		{ ".git/hooks", 0 },
     ++		{ ".git/info", 0 },
     ++		{ ".git/logs", 1 },
     ++		{ ".git/objects/info", 0 }
     ++	};
     ++
      +	if (mode == DIAGNOSE_NONE) {
      +		res = 0;
      +		goto diagnose_cleanup;
      +	}
     -+
     + 
       	stdout_fd = dup(STDOUT_FILENO);
       	if (stdout_fd < 0) {
     - 		res = error_errno(_("could not duplicate stdout"));
      @@ diagnose.c: int create_diagnostics_archive(struct strbuf *zip_path)
       	loose_objs_stats(&buf, ".git/objects");
       	strvec_push(&archiver_args, buf.buf);
     @@ diagnose.c: int create_diagnostics_archive(struct strbuf *zip_path)
      -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
      -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
      -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
     +-		goto diagnose_cleanup;
      +	/* Only include this if explicitly requested */
     -+	if (mode == DIAGNOSE_ALL &&
     -+	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     -+	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
     - 		goto diagnose_cleanup;
     ++	if (mode == DIAGNOSE_ALL) {
     ++		for (i = 0; i < ARRAY_SIZE(archive_dirs); i++) {
     ++			if (add_directory_to_archiver(&archiver_args,
     ++						      archive_dirs[i].path,
     ++						      archive_dirs[i].recursive)) {
     ++				res = error_errno(_("could not add directory '%s' to archiver"),
     ++						  archive_dirs[i].path);
     ++				goto diagnose_cleanup;
     ++			}
     ++		}
     ++	}
       
       	strvec_pushl(&archiver_args, "--prefix=",
     + 		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
      
       ## diagnose.h ##
      @@
     - #define DIAGNOSE_H
       
       #include "strbuf.h"
     -+#include "parse-options.h"
       
      -int create_diagnostics_archive(struct strbuf *zip_path);
      +enum diagnose_mode {
  7:  bf3c073a985 =  7:  b58d13325b2 builtin/diagnose.c: create 'git diagnose' builtin
  8:  3da0cb725c9 !  8:  82be069e5e9 builtin/diagnose.c: add '--mode' option
     @@ builtin/diagnose.c: int cmd_diagnose(int argc, const char **argv, const char *pr
       
      
       ## diagnose.c ##
     -@@
     - #include "object-store.h"
     - #include "packfile.h"
     +@@ diagnose.c: struct archive_dir {
     + 	int recursive;
     + };
       
      +struct diagnose_option {
      +	enum diagnose_mode mode;
     @@ diagnose.c
       {
      
       ## diagnose.h ##
     +@@
     + #define DIAGNOSE_H
     + 
     + #include "strbuf.h"
     ++#include "parse-options.h"
     + 
     + enum diagnose_mode {
     + 	DIAGNOSE_NONE,
      @@ diagnose.h: enum diagnose_mode {
       	DIAGNOSE_ALL
       };
  9:  1a1eb2c9806 =  9:  718e3f43484 builtin/bugreport.c: create '--diagnose' option
 10:  d22674752f0 = 10:  94b32eacdd5 scalar-diagnose: use 'git diagnose --mode=all'
 11:  b64475f5b17 = 11:  728f8b81fd0 scalar: update technical doc roadmap

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v4 01/11] scalar-diagnose: use "$GIT_UNZIP" in test
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 02/11] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
tests on systems where 'unzip' is not in the system path.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/t/t9099-scalar.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 10b1172a8aa..fac86a57550 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -109,14 +109,14 @@ test_expect_success UNZIP 'scalar diagnose' '
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-	unzip -v "$zip_path" &&
+	"$GIT_UNZIP" -v "$zip_path" &&
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
-	unzip -p "$zip_path" diagnostics.log >out &&
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
-	unzip -p "$zip_path" packs-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
 	grep "$(pwd)/.git/objects" out &&
-	unzip -p "$zip_path" objects-local.txt >out &&
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
 	grep "^Total: [1-9]" out
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 02/11] scalar-diagnose: avoid 32-bit overflow of size_t
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 03/11] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Avoid 32-bit size_t overflow when reporting the available disk space in
'get_disk_info' by casting the block size and available block count to
'off_t' before multiplying them. Without this change, 'st_mult' would
(correctly) report a size_t overflow on 32-bit systems at or exceeding 2^32
bytes of available space.

Note that 'off_t' is a 64-bit integer even on 32-bit systems due to the
inclusion of '#define _FILE_OFFSET_BITS 64' in 'git-compat-util.h' (see
b97e911643 (Support for large files on 32bit systems., 2007-02-17)).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 97e71fe19cd..04046452284 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -348,7 +348,7 @@ static int get_disk_info(struct strbuf *out)
 	}
 
 	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
 	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
 	strbuf_release(&buf);
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 03/11] scalar-diagnose: add directory to archiver more gently
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 02/11] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

If a directory added to the 'scalar diagnose' archiver does not exist, warn
and return 0 from 'add_directory_to_archiver()' rather than failing with a
fatal error. This handles a failure edge case where the '.git/logs' has not
yet been created when running 'scalar diagnose', but extends to any
situation where a directory may be missing in the '.git' dir.

Now, when a directory is missing a warning is captured in the diagnostic
logs. This provides a user with more complete information than if 'scalar
diagnose' simply failed with an error.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 04046452284..b9092f0b612 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 					  const char *path, int recurse)
 {
 	int at_root = !*path;
-	DIR *dir = opendir(at_root ? "." : path);
+	DIR *dir;
 	struct dirent *e;
 	struct strbuf buf = STRBUF_INIT;
 	size_t len;
 	int res = 0;
 
-	if (!dir)
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
 		return error_errno(_("could not open directory '%s'"), path);
+	}
 
 	if (!at_root)
 		strbuf_addf(&buf, "%s/", path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/'
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 03/11] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Move 'get_disk_info()' function into 'compat/'. Although Scalar-specific
code is generally not part of the main Git tree, 'get_disk_info()' will be
used in subsequent patches by additional callers beyond 'scalar diagnose'.
This patch prepares for that change, at which point this platform-specific
code should be part of 'compat/' as a matter of convention.

The function is copied *mostly* verbatim, with two exceptions:

* '#ifdef WIN32' is replaced with '#ifdef GIT_WINDOWS_NATIVE' to allow
  'statvfs' to be used with Cygwin.
* the 'struct strbuf buf' and 'int res' (as well as their corresponding
  cleanup & return) are moved outside of the '#ifdef' block.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 compat/disk.h           | 56 +++++++++++++++++++++++++++++++++++++++++
 contrib/scalar/scalar.c | 53 +-------------------------------------
 git-compat-util.h       |  1 +
 3 files changed, 58 insertions(+), 52 deletions(-)
 create mode 100644 compat/disk.h

diff --git a/compat/disk.h b/compat/disk.h
new file mode 100644
index 00000000000..50a32e3d8a4
--- /dev/null
+++ b/compat/disk.h
@@ -0,0 +1,56 @@
+#ifndef COMPAT_DISK_H
+#define COMPAT_DISK_H
+
+#include "git-compat-util.h"
+
+static int get_disk_info(struct strbuf *out)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int res = 0;
+
+#ifdef GIT_WINDOWS_NATIVE
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+#else
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		res = -1;
+		goto cleanup;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+#endif
+
+cleanup:
+	strbuf_release(&buf);
+	return res;
+}
+
+#endif /* COMPAT_DISK_H */
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index b9092f0b612..607fedefd82 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -13,6 +13,7 @@
 #include "help.h"
 #include "archive.h"
 #include "object-store.h"
+#include "compat/disk.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -309,58 +310,6 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
-#ifndef WIN32
-#include <sys/statvfs.h>
-#endif
-
-static int get_disk_info(struct strbuf *out)
-{
-#ifdef WIN32
-	struct strbuf buf = STRBUF_INIT;
-	char volume_name[MAX_PATH], fs_name[MAX_PATH];
-	DWORD serial_number, component_length, flags;
-	ULARGE_INTEGER avail2caller, total, avail;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
-		error(_("could not determine free disk size for '%s'"),
-		      buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_setlen(&buf, offset_1st_component(buf.buf));
-	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
-				   &serial_number, &component_length, &flags,
-				   fs_name, sizeof(fs_name))) {
-		error(_("could not get info for '%s'"), buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, avail2caller.QuadPart);
-	strbuf_addch(out, '\n');
-	strbuf_release(&buf);
-#else
-	struct strbuf buf = STRBUF_INIT;
-	struct statvfs stat;
-
-	strbuf_realpath(&buf, ".", 1);
-	if (statvfs(buf.buf, &stat) < 0) {
-		error_errno(_("could not determine free disk size for '%s'"),
-			    buf.buf);
-		strbuf_release(&buf);
-		return -1;
-	}
-
-	strbuf_addf(out, "Available space on '%s': ", buf.buf);
-	strbuf_humanise_bytes(out, (off_t)stat.f_bsize * (off_t)stat.f_bavail);
-	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
-	strbuf_release(&buf);
-#endif
-	return 0;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
diff --git a/git-compat-util.h b/git-compat-util.h
index 58d7708296b..9a62e3a0d2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -258,6 +258,7 @@ static inline int is_xplatform_dir_sep(int c)
 #include <sys/resource.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/statvfs.h>
 #include <termios.h>
 #ifndef NO_SYS_SELECT_H
 #include <sys/select.h>
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 05/11] scalar-diagnose: move functionality to common location
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:26         ` Junio C Hamano
  2022-08-12 20:10       ` [PATCH v4 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
                         ` (5 subsequent siblings)
  10 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Move the core functionality of 'scalar diagnose' into a new 'diagnose.[c,h]'
library to prepare for new callers in the main Git tree generating
diagnostic archives. These callers will be introduced in subsequent patches.

While this patch appears large, it is mostly made up of moving code out of
'scalar.c' and into 'diagnose.c'. Specifically, the functions

- dir_file_stats_objects()
- dir_file_stats()
- count_files()
- loose_objs_stats()
- add_directory_to_archiver()

are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()'
function is a mostly identical (partial) copy of 'cmd_diagnose()', with the
primary changes being that 'zip_path' is an input and "Enlistment root" is
corrected to "Repository root" in the archiver log.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                |   1 +
 contrib/scalar/scalar.c | 202 +------------------------------------
 diagnose.c              | 216 ++++++++++++++++++++++++++++++++++++++++
 diagnose.h              |   8 ++
 4 files changed, 227 insertions(+), 200 deletions(-)
 create mode 100644 diagnose.c
 create mode 100644 diagnose.h

diff --git a/Makefile b/Makefile
index 2ec9b2dc6bb..ed66cb70e5a 100644
--- a/Makefile
+++ b/Makefile
@@ -932,6 +932,7 @@ LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += delta-islands.o
+LIB_OBJS += diagnose.o
 LIB_OBJS += diff-delta.o
 LIB_OBJS += diff-merges.o
 LIB_OBJS += diff-lib.o
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 607fedefd82..3983def760a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,9 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "archive.h"
-#include "object-store.h"
-#include "compat/disk.h"
+#include "diagnose.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -263,53 +261,6 @@ static int unregister_dir(void)
 	return res;
 }
 
-static int add_directory_to_archiver(struct strvec *archiver_args,
-					  const char *path, int recurse)
-{
-	int at_root = !*path;
-	DIR *dir;
-	struct dirent *e;
-	struct strbuf buf = STRBUF_INIT;
-	size_t len;
-	int res = 0;
-
-	dir = opendir(at_root ? "." : path);
-	if (!dir) {
-		if (errno == ENOENT) {
-			warning(_("could not archive missing directory '%s'"), path);
-			return 0;
-		}
-		return error_errno(_("could not open directory '%s'"), path);
-	}
-
-	if (!at_root)
-		strbuf_addf(&buf, "%s/", path);
-	len = buf.len;
-	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
-
-	while (!res && (e = readdir(dir))) {
-		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
-			continue;
-
-		strbuf_setlen(&buf, len);
-		strbuf_addstr(&buf, e->d_name);
-
-		if (e->d_type == DT_REG)
-			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
-		else if (e->d_type != DT_DIR)
-			warning(_("skipping '%s', which is neither file nor "
-				  "directory"), buf.buf);
-		else if (recurse &&
-			 add_directory_to_archiver(archiver_args,
-						   buf.buf, recurse) < 0)
-			res = -1;
-	}
-
-	closedir(dir);
-	strbuf_release(&buf);
-	return res;
-}
-
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -550,83 +501,6 @@ cleanup:
 	return res;
 }
 
-static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
-				   const char *file_name, void *data)
-{
-	struct strbuf *buf = data;
-	struct stat st;
-
-	if (!stat(full_path, &st))
-		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
-			    (uintmax_t)st.st_size);
-}
-
-static int dir_file_stats(struct object_directory *object_dir, void *data)
-{
-	struct strbuf *buf = data;
-
-	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
-
-	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
-				  data);
-
-	return 0;
-}
-
-static int count_files(char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count = 0;
-
-	if (!dir)
-		return 0;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
-			count++;
-
-	closedir(dir);
-	return count;
-}
-
-static void loose_objs_stats(struct strbuf *buf, const char *path)
-{
-	DIR *dir = opendir(path);
-	struct dirent *e;
-	int count;
-	int total = 0;
-	unsigned char c;
-	struct strbuf count_path = STRBUF_INIT;
-	size_t base_path_len;
-
-	if (!dir)
-		return;
-
-	strbuf_addstr(buf, "Object directory stats for ");
-	strbuf_add_absolute_path(buf, path);
-	strbuf_addstr(buf, ":\n");
-
-	strbuf_add_absolute_path(&count_path, path);
-	strbuf_addch(&count_path, '/');
-	base_path_len = count_path.len;
-
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name) &&
-		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
-		    !hex_to_bytes(&c, e->d_name, 1)) {
-			strbuf_setlen(&count_path, base_path_len);
-			strbuf_addstr(&count_path, e->d_name);
-			total += (count = count_files(count_path.buf));
-			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
-		}
-
-	strbuf_addf(buf, "Total: %d loose objects", total);
-
-	strbuf_release(&count_path);
-	closedir(dir);
-}
-
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -637,12 +511,8 @@ static int cmd_diagnose(int argc, const char **argv)
 		NULL
 	};
 	struct strbuf zip_path = STRBUF_INIT;
-	struct strvec archiver_args = STRVEC_INIT;
-	char **argv_copy = NULL;
-	int stdout_fd = -1, archiver_fd = -1;
 	time_t now = time(NULL);
 	struct tm tm;
-	struct strbuf buf = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
@@ -663,79 +533,11 @@ static int cmd_diagnose(int argc, const char **argv)
 			    zip_path.buf);
 		goto diagnose_cleanup;
 	}
-	stdout_fd = dup(1);
-	if (stdout_fd < 0) {
-		res = error_errno(_("could not duplicate stdout"));
-		goto diagnose_cleanup;
-	}
-
-	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
-	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
-		res = error_errno(_("could not redirect output"));
-		goto diagnose_cleanup;
-	}
-
-	init_zip_archiver();
-	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
-	get_version_info(&buf, 1);
-
-	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
-	get_disk_info(&buf);
-	write_or_die(stdout_fd, buf.buf, buf.len);
-	strvec_pushf(&archiver_args,
-		     "--add-virtual-file=diagnostics.log:%.*s",
-		     (int)buf.len, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
-	dir_file_stats(the_repository->objects->odb, &buf);
-	foreach_alt_odb(dir_file_stats, &buf);
-	strvec_push(&archiver_args, buf.buf);
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
-	loose_objs_stats(&buf, ".git/objects");
-	strvec_push(&archiver_args, buf.buf);
-
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
-		goto diagnose_cleanup;
-
-	strvec_pushl(&archiver_args, "--prefix=",
-		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
-
-	/* `write_archive()` modifies the `argv` passed to it. Let it. */
-	argv_copy = xmemdupz(archiver_args.v,
-			     sizeof(char *) * archiver_args.nr);
-	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
-			    the_repository, NULL, 0);
-	if (res) {
-		error(_("failed to write archive"));
-		goto diagnose_cleanup;
-	}
 
-	if (!res)
-		fprintf(stderr, "\n"
-		       "Diagnostics complete.\n"
-		       "All of the gathered info is captured in '%s'\n",
-		       zip_path.buf);
+	res = create_diagnostics_archive(&zip_path);
 
 diagnose_cleanup:
-	if (archiver_fd >= 0) {
-		close(1);
-		dup2(stdout_fd, 1);
-	}
-	free(argv_copy);
-	strvec_clear(&archiver_args);
 	strbuf_release(&zip_path);
-	strbuf_release(&buf);
-
 	return res;
 }
 
diff --git a/diagnose.c b/diagnose.c
new file mode 100644
index 00000000000..f0dcbfe1a2a
--- /dev/null
+++ b/diagnose.c
@@ -0,0 +1,216 @@
+#include "cache.h"
+#include "diagnose.h"
+#include "compat/disk.h"
+#include "archive.h"
+#include "dir.h"
+#include "help.h"
+#include "strvec.h"
+#include "object-store.h"
+#include "packfile.h"
+
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
+static int add_directory_to_archiver(struct strvec *archiver_args,
+				     const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir;
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
+		return error_errno(_("could not open directory '%s'"), path);
+	}
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
+int create_diagnostics_archive(struct strbuf *zip_path)
+{
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	struct strbuf buf = STRBUF_INIT;
+	int res;
+
+	stdout_fd = dup(STDOUT_FILENO);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path->buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (dup2(archiver_fd, STDOUT_FILENO) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "git-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Repository root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	fprintf(stderr, "\n"
+		"Diagnostics complete.\n"
+		"All of the gathered info is captured in '%s'\n",
+		zip_path->buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		dup2(stdout_fd, STDOUT_FILENO);
+		close(stdout_fd);
+		close(archiver_fd);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&buf);
+
+	return res;
+}
diff --git a/diagnose.h b/diagnose.h
new file mode 100644
index 00000000000..06dca69bdac
--- /dev/null
+++ b/diagnose.h
@@ -0,0 +1,8 @@
+#ifndef DIAGNOSE_H
+#define DIAGNOSE_H
+
+#include "strbuf.h"
+
+int create_diagnostics_archive(struct strbuf *zip_path);
+
+#endif /* DIAGNOSE_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 06/11] diagnose.c: add option to configure archive contents
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:31         ` Junio C Hamano
  2022-08-12 20:10       ` [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
                         ` (4 subsequent siblings)
  10 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'create_diagnostics_archive()' to take an argument 'mode'. When
archiving diagnostics for a repository, 'mode' is used to selectively
include/exclude information based on its value. The initial options for
'mode' are:

* DIAGNOSE_NONE: do not collect any diagnostics or create an archive
  (no-op).
* DIAGNOSE_STATS: collect basic repository metadata (Git version, repo path,
  filesystem available space) as well as sizing and count statistics for the
  repository's objects and packfiles.
* DIAGNOSE_ALL: collect basic repository metadata, sizing/count statistics,
  and copies of the '.git', '.git/hooks', '.git/info', '.git/logs', and
  '.git/objects/info' directories.

These modes are introduced to provide users the option to collect
diagnostics without the sensitive information included in copies of '.git'
dir contents. At the moment, only 'scalar diagnose' uses
'create_diagnostics_archive()' (with a hardcoded 'DIAGNOSE_ALL' mode to
match existing functionality), but more callers will be introduced in
subsequent patches.

Finally, refactor from a hardcoded set of 'add_directory_to_archiver()'
calls to iterative invocations gated by 'DIAGNOSE_ALL'. This allows for
easier future modification of the set of directories to archive and improves
error reporting when 'add_directory_to_archiver()' fails.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c |  2 +-
 diagnose.c              | 39 +++++++++++++++++++++++++++++++--------
 diagnose.h              |  8 +++++++-
 3 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 3983def760a..d538b8b8f14 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -534,7 +534,7 @@ static int cmd_diagnose(int argc, const char **argv)
 		goto diagnose_cleanup;
 	}
 
-	res = create_diagnostics_archive(&zip_path);
+	res = create_diagnostics_archive(&zip_path, DIAGNOSE_ALL);
 
 diagnose_cleanup:
 	strbuf_release(&zip_path);
diff --git a/diagnose.c b/diagnose.c
index f0dcbfe1a2a..9270056db2f 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -8,6 +8,11 @@
 #include "object-store.h"
 #include "packfile.h"
 
+struct archive_dir {
+	const char *path;
+	int recursive;
+};
+
 static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
 				   const char *file_name, void *data)
 {
@@ -132,13 +137,25 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
-int create_diagnostics_archive(struct strbuf *zip_path)
+int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode)
 {
 	struct strvec archiver_args = STRVEC_INIT;
 	char **argv_copy = NULL;
 	int stdout_fd = -1, archiver_fd = -1;
 	struct strbuf buf = STRBUF_INIT;
-	int res;
+	int res, i;
+	struct archive_dir archive_dirs[] = {
+		{ ".git", 0 },
+		{ ".git/hooks", 0 },
+		{ ".git/info", 0 },
+		{ ".git/logs", 1 },
+		{ ".git/objects/info", 0 }
+	};
+
+	if (mode == DIAGNOSE_NONE) {
+		res = 0;
+		goto diagnose_cleanup;
+	}
 
 	stdout_fd = dup(STDOUT_FILENO);
 	if (stdout_fd < 0) {
@@ -177,12 +194,18 @@ int create_diagnostics_archive(struct strbuf *zip_path)
 	loose_objs_stats(&buf, ".git/objects");
 	strvec_push(&archiver_args, buf.buf);
 
-	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
-	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
-		goto diagnose_cleanup;
+	/* Only include this if explicitly requested */
+	if (mode == DIAGNOSE_ALL) {
+		for (i = 0; i < ARRAY_SIZE(archive_dirs); i++) {
+			if (add_directory_to_archiver(&archiver_args,
+						      archive_dirs[i].path,
+						      archive_dirs[i].recursive)) {
+				res = error_errno(_("could not add directory '%s' to archiver"),
+						  archive_dirs[i].path);
+				goto diagnose_cleanup;
+			}
+		}
+	}
 
 	strvec_pushl(&archiver_args, "--prefix=",
 		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
diff --git a/diagnose.h b/diagnose.h
index 06dca69bdac..998775857a0 100644
--- a/diagnose.h
+++ b/diagnose.h
@@ -3,6 +3,12 @@
 
 #include "strbuf.h"
 
-int create_diagnostics_archive(struct strbuf *zip_path);
+enum diagnose_mode {
+	DIAGNOSE_NONE,
+	DIAGNOSE_STATS,
+	DIAGNOSE_ALL
+};
+
+int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode);
 
 #endif /* DIAGNOSE_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-18 18:43         ` Ævar Arnfjörð Bjarmason
  2022-08-12 20:10       ` [PATCH v4 08/11] builtin/diagnose.c: add '--mode' option Victoria Dye via GitGitGadget
                         ` (3 subsequent siblings)
  10 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a 'git diagnose' builtin to generate a standalone zip archive of
repository diagnostics.

The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
can be useful when diagnosing issues in any Git repository.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 .gitignore                     |  1 +
 Documentation/git-diagnose.txt | 50 +++++++++++++++++++++++++++++
 Makefile                       |  1 +
 builtin.h                      |  1 +
 builtin/diagnose.c             | 57 ++++++++++++++++++++++++++++++++++
 git.c                          |  1 +
 t/t0092-diagnose.sh            | 32 +++++++++++++++++++
 7 files changed, 143 insertions(+)
 create mode 100644 Documentation/git-diagnose.txt
 create mode 100644 builtin/diagnose.c
 create mode 100755 t/t0092-diagnose.sh

diff --git a/.gitignore b/.gitignore
index 42fd7253b44..80b530bbed2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -53,6 +53,7 @@
 /git-cvsimport
 /git-cvsserver
 /git-daemon
+/git-diagnose
 /git-diff
 /git-diff-files
 /git-diff-index
diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
new file mode 100644
index 00000000000..ce07dd0725d
--- /dev/null
+++ b/Documentation/git-diagnose.txt
@@ -0,0 +1,50 @@
+git-diagnose(1)
+================
+
+NAME
+----
+git-diagnose - Generate a zip archive of diagnostic information
+
+SYNOPSIS
+--------
+[verse]
+'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+
+DESCRIPTION
+-----------
+Collects detailed information about the user's machine, Git client, and
+repository state and packages that information into a zip archive. The
+generated archive can then, for example, be shared with the Git mailing list to
+help debug an issue or serve as a reference for independent debugging.
+
+The following information is captured in the archive:
+
+  * 'git version --build-options'
+  * The path to the repository root
+  * The available disk space on the filesystem
+  * The name and size of each packfile, including those in alternate object
+    stores
+  * The total count of loose objects, as well as counts broken down by
+    `.git/objects` subdirectory
+
+This tool differs from linkgit:git-bugreport[1] in that it collects much more
+detailed information with a greater focus on reporting the size and data shape
+of repository contents.
+
+OPTIONS
+-------
+-o <path>::
+--output-directory <path>::
+	Place the resulting diagnostics archive in `<path>` instead of the
+	current directory.
+
+-s <format>::
+--suffix <format>::
+	Specify an alternate suffix for the diagnostics archive name, to create
+	a file named 'git-diagnostics-<formatted suffix>'. This should take the
+	form of a strftime(3) format string; the current local time will be
+	used.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index ed66cb70e5a..d34f680c065 100644
--- a/Makefile
+++ b/Makefile
@@ -1153,6 +1153,7 @@ BUILTIN_OBJS += builtin/credential-cache.o
 BUILTIN_OBJS += builtin/credential-store.o
 BUILTIN_OBJS += builtin/credential.o
 BUILTIN_OBJS += builtin/describe.o
+BUILTIN_OBJS += builtin/diagnose.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
 BUILTIN_OBJS += builtin/diff-tree.o
diff --git a/builtin.h b/builtin.h
index 40e9ecc8485..8901a34d6bf 100644
--- a/builtin.h
+++ b/builtin.h
@@ -144,6 +144,7 @@ int cmd_credential_cache(int argc, const char **argv, const char *prefix);
 int cmd_credential_cache_daemon(int argc, const char **argv, const char *prefix);
 int cmd_credential_store(int argc, const char **argv, const char *prefix);
 int cmd_describe(int argc, const char **argv, const char *prefix);
+int cmd_diagnose(int argc, const char **argv, const char *prefix);
 int cmd_diff_files(int argc, const char **argv, const char *prefix);
 int cmd_diff_index(int argc, const char **argv, const char *prefix);
 int cmd_diff(int argc, const char **argv, const char *prefix);
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
new file mode 100644
index 00000000000..832493bba65
--- /dev/null
+++ b/builtin/diagnose.c
@@ -0,0 +1,57 @@
+#include "builtin.h"
+#include "parse-options.h"
+#include "diagnose.h"
+
+static const char * const diagnose_usage[] = {
+	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
+	NULL
+};
+
+int cmd_diagnose(int argc, const char **argv, const char *prefix)
+{
+	struct strbuf zip_path = STRBUF_INIT;
+	time_t now = time(NULL);
+	struct tm tm;
+	char *option_output = NULL;
+	char *option_suffix = "%Y-%m-%d-%H%M";
+	char *prefixed_filename;
+
+	const struct option diagnose_options[] = {
+		OPT_STRING('o', "output-directory", &option_output, N_("path"),
+			   N_("specify a destination for the diagnostics archive")),
+		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
+			   N_("specify a strftime format suffix for the filename")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, diagnose_options,
+			     diagnose_usage, 0);
+
+	/* Prepare the path to put the result */
+	prefixed_filename = prefix_filename(prefix,
+					    option_output ? option_output : "");
+	strbuf_addstr(&zip_path, prefixed_filename);
+	strbuf_complete(&zip_path, '/');
+
+	strbuf_addstr(&zip_path, "git-diagnostics-");
+	strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_OK:
+	case SCLD_EXISTS:
+		break;
+	default:
+		die_errno(_("could not create leading directories for '%s'"),
+			  zip_path.buf);
+	}
+
+	/* Prepare diagnostics */
+	if (create_diagnostics_archive(&zip_path, DIAGNOSE_STATS))
+		die_errno(_("unable to create diagnostics archive %s"),
+			  zip_path.buf);
+
+	free(prefixed_filename);
+	strbuf_release(&zip_path);
+	return 0;
+}
diff --git a/git.c b/git.c
index e5d62fa5a92..0b9d8ef7677 100644
--- a/git.c
+++ b/git.c
@@ -522,6 +522,7 @@ static struct cmd_struct commands[] = {
 	{ "credential-cache--daemon", cmd_credential_cache_daemon },
 	{ "credential-store", cmd_credential_store },
 	{ "describe", cmd_describe, RUN_SETUP },
+	{ "diagnose", cmd_diagnose, RUN_SETUP_GENTLY },
 	{ "diff", cmd_diff, NO_PARSEOPT },
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
new file mode 100755
index 00000000000..b6923726fd7
--- /dev/null
+++ b/t/t0092-diagnose.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='git diagnose'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success UNZIP 'creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test >out &&
+	grep "Available space" out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out &&
+
+	# Should not include .git directory contents by default
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+	grep "^Total: [0-9][0-9]*" out
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 08/11] builtin/diagnose.c: add '--mode' option
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (6 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create '--mode=<mode>' option in 'git diagnose' to allow users to optionally
select non-default diagnostic information to include in the output archive.
Additionally, document the currently-available modes, emphasizing the
importance of not sharing a '--mode=all' archive publicly due to the
presence of sensitive information.

Note that the option parsing callback - 'option_parse_diagnose()' - is added
to 'diagnose.c' rather than 'builtin/diagnose.c' so that it may be reused in
future callers configuring a diagnostics archive.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-diagnose.txt | 17 ++++++++++++++++-
 builtin/diagnose.c             |  8 ++++++--
 diagnose.c                     | 30 ++++++++++++++++++++++++++++++
 diagnose.h                     |  3 +++
 t/t0092-diagnose.sh            | 30 +++++++++++++++++++++++++++++-
 5 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-diagnose.txt b/Documentation/git-diagnose.txt
index ce07dd0725d..3ec8cc7ad72 100644
--- a/Documentation/git-diagnose.txt
+++ b/Documentation/git-diagnose.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+	       [--mode=<mode>]
 
 DESCRIPTION
 -----------
@@ -17,7 +18,7 @@ repository state and packages that information into a zip archive. The
 generated archive can then, for example, be shared with the Git mailing list to
 help debug an issue or serve as a reference for independent debugging.
 
-The following information is captured in the archive:
+By default, the following information is captured in the archive:
 
   * 'git version --build-options'
   * The path to the repository root
@@ -27,6 +28,9 @@ The following information is captured in the archive:
   * The total count of loose objects, as well as counts broken down by
     `.git/objects` subdirectory
 
+Additional information can be collected by selecting a different diagnostic mode
+using the `--mode` option.
+
 This tool differs from linkgit:git-bugreport[1] in that it collects much more
 detailed information with a greater focus on reporting the size and data shape
 of repository contents.
@@ -45,6 +49,17 @@ OPTIONS
 	form of a strftime(3) format string; the current local time will be
 	used.
 
+--mode=(stats|all)::
+	Specify the type of diagnostics that should be collected. The default behavior
+	of 'git diagnose' is equivalent to `--mode=stats`.
++
+The `--mode=all` option collects everything included in `--mode=stats`, as well
+as copies of `.git`, `.git/hooks`, `.git/info`, `.git/logs`, and
+`.git/objects/info` directories. This additional information may be sensitive,
+as it can be used to reconstruct the full contents of the diagnosed repository.
+Users should exercise caution when sharing an archive generated with
+`--mode=all`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/diagnose.c b/builtin/diagnose.c
index 832493bba65..cd260c20155 100644
--- a/builtin/diagnose.c
+++ b/builtin/diagnose.c
@@ -3,7 +3,7 @@
 #include "diagnose.h"
 
 static const char * const diagnose_usage[] = {
-	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
+	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>] [--mode=<mode>]"),
 	NULL
 };
 
@@ -12,6 +12,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 	struct strbuf zip_path = STRBUF_INIT;
 	time_t now = time(NULL);
 	struct tm tm;
+	enum diagnose_mode mode = DIAGNOSE_STATS;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	char *prefixed_filename;
@@ -21,6 +22,9 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 			   N_("specify a destination for the diagnostics archive")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
 			   N_("specify a strftime format suffix for the filename")),
+		OPT_CALLBACK_F(0, "mode", &mode, N_("(stats|all)"),
+			       N_("specify the content of the diagnostic archive"),
+			       PARSE_OPT_NONEG, option_parse_diagnose),
 		OPT_END()
 	};
 
@@ -47,7 +51,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
 	}
 
 	/* Prepare diagnostics */
-	if (create_diagnostics_archive(&zip_path, DIAGNOSE_STATS))
+	if (create_diagnostics_archive(&zip_path, mode))
 		die_errno(_("unable to create diagnostics archive %s"),
 			  zip_path.buf);
 
diff --git a/diagnose.c b/diagnose.c
index 9270056db2f..beb0a8741ba 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -13,6 +13,36 @@ struct archive_dir {
 	int recursive;
 };
 
+struct diagnose_option {
+	enum diagnose_mode mode;
+	const char *option_name;
+};
+
+static struct diagnose_option diagnose_options[] = {
+	{ DIAGNOSE_STATS, "stats" },
+	{ DIAGNOSE_ALL, "all" },
+};
+
+int option_parse_diagnose(const struct option *opt, const char *arg, int unset)
+{
+	int i;
+	enum diagnose_mode *diagnose = opt->value;
+
+	if (!arg) {
+		*diagnose = unset ? DIAGNOSE_NONE : DIAGNOSE_STATS;
+		return 0;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(diagnose_options); i++) {
+		if (!strcmp(arg, diagnose_options[i].option_name)) {
+			*diagnose = diagnose_options[i].mode;
+			return 0;
+		}
+	}
+
+	return error(_("invalid --%s value '%s'"), opt->long_name, arg);
+}
+
 static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
 				   const char *file_name, void *data)
 {
diff --git a/diagnose.h b/diagnose.h
index 998775857a0..7a4951a7863 100644
--- a/diagnose.h
+++ b/diagnose.h
@@ -2,6 +2,7 @@
 #define DIAGNOSE_H
 
 #include "strbuf.h"
+#include "parse-options.h"
 
 enum diagnose_mode {
 	DIAGNOSE_NONE,
@@ -9,6 +10,8 @@ enum diagnose_mode {
 	DIAGNOSE_ALL
 };
 
+int option_parse_diagnose(const struct option *opt, const char *arg, int unset);
+
 int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode);
 
 #endif /* DIAGNOSE_H */
diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
index b6923726fd7..fca9b58489c 100755
--- a/t/t0092-diagnose.sh
+++ b/t/t0092-diagnose.sh
@@ -26,7 +26,35 @@ test_expect_success UNZIP 'creates diagnostics zip archive' '
 
 	# Should not include .git directory contents by default
 	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
-	grep "^Total: [0-9][0-9]*" out
+'
+
+test_expect_success UNZIP '--mode=stats excludes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test --mode=stats >out &&
+
+	# Includes pack quantity/size info
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	# Does not include .git directory contents
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--mode=all includes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git diagnose -o report -s test --mode=all >out &&
+
+	# Includes pack quantity/size info
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	# Includes .git directory contents
+	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
+
+	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
+	test_file_not_empty out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 09/11] builtin/bugreport.c: create '--diagnose' option
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (7 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 08/11] builtin/diagnose.c: add '--mode' option Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 10/11] scalar-diagnose: use 'git diagnose --mode=all' Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 11/11] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Create a '--diagnose' option for 'git bugreport' to collect additional
information about the repository and write it to a zipped archive.

The '--diagnose' option behaves effectively as an alias for simultaneously
running 'git bugreport' and 'git diagnose'. In the documentation, users are
explicitly recommended to attach the diagnostics alongside a bug report to
provide additional context to readers, ideally reducing some back-and-forth
between reporters and those debugging the issue.

Note that '--diagnose' may take an optional string arg (either 'stats' or
'all'). If specified without the arg, the behavior corresponds to running
'git diagnose' without '--mode'. As with 'git diagnose', this default is
intended to help reduce unintentional leaking of sensitive information).
Users can also explicitly specify '--diagnose=(stats|all)' to generate the
respective archive created by 'git diagnose --mode=(stats|all)'.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-bugreport.txt | 18 +++++++++++++
 builtin/bugreport.c             | 27 ++++++++++++++++---
 t/t0091-bugreport.sh            | 48 +++++++++++++++++++++++++++++++++
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
index d8817bf3cec..eca726e5791 100644
--- a/Documentation/git-bugreport.txt
+++ b/Documentation/git-bugreport.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
+		[--diagnose[=<mode>]]
 
 DESCRIPTION
 -----------
@@ -31,6 +32,10 @@ The following information is captured automatically:
  - A list of enabled hooks
  - $SHELL
 
+Additional information may be gathered into a separate zip archive using the
+`--diagnose` option, and can be attached alongside the bugreport document to
+provide additional context to readers.
+
 This tool is invoked via the typical Git setup process, which means that in some
 cases, it might not be able to launch - for example, if a relevant config file
 is unreadable. In this kind of scenario, it may be helpful to manually gather
@@ -49,6 +54,19 @@ OPTIONS
 	named 'git-bugreport-<formatted suffix>'. This should take the form of a
 	strftime(3) format string; the current local time will be used.
 
+--no-diagnose::
+--diagnose[=<mode>]::
+	Create a zip archive of supplemental information about the user's
+	machine, Git client, and repository state. The archive is written to the
+	same output directory as the bug report and is named
+	'git-diagnostics-<formatted suffix>'.
++
+Without `mode` specified, the diagnostic archive will contain the default set of
+statistics reported by `git diagnose`. An optional `mode` value may be specified
+to change which information is included in the archive. See
+linkgit:git-diagnose[1] for the list of valid values for `mode` and details
+about their usage.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 9de32bc96e7..530895be55f 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -5,6 +5,7 @@
 #include "compat/compiler.h"
 #include "hook.h"
 #include "hook-list.h"
+#include "diagnose.h"
 
 
 static void get_system_info(struct strbuf *sys_info)
@@ -59,7 +60,7 @@ static void get_populated_hooks(struct strbuf *hook_info, int nongit)
 }
 
 static const char * const bugreport_usage[] = {
-	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
+	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>] [--diagnose[=<mode>]"),
 	NULL
 };
 
@@ -98,16 +99,21 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	int report = -1;
 	time_t now = time(NULL);
 	struct tm tm;
+	enum diagnose_mode diagnose = DIAGNOSE_NONE;
 	char *option_output = NULL;
 	char *option_suffix = "%Y-%m-%d-%H%M";
 	const char *user_relative_path = NULL;
 	char *prefixed_filename;
+	size_t output_path_len;
 
 	const struct option bugreport_options[] = {
+		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("mode"),
+			       N_("create an additional zip archive of detailed diagnostics (default 'stats')"),
+			       PARSE_OPT_OPTARG, option_parse_diagnose),
 		OPT_STRING('o', "output-directory", &option_output, N_("path"),
-			   N_("specify a destination for the bugreport file")),
+			   N_("specify a destination for the bugreport file(s)")),
 		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
-			   N_("specify a strftime format suffix for the filename")),
+			   N_("specify a strftime format suffix for the filename(s)")),
 		OPT_END()
 	};
 
@@ -119,6 +125,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 					    option_output ? option_output : "");
 	strbuf_addstr(&report_path, prefixed_filename);
 	strbuf_complete(&report_path, '/');
+	output_path_len = report_path.len;
 
 	strbuf_addstr(&report_path, "git-bugreport-");
 	strbuf_addftime(&report_path, option_suffix, localtime_r(&now, &tm), 0, 0);
@@ -133,6 +140,20 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 		    report_path.buf);
 	}
 
+	/* Prepare diagnostics, if requested */
+	if (diagnose != DIAGNOSE_NONE) {
+		struct strbuf zip_path = STRBUF_INIT;
+		strbuf_add(&zip_path, report_path.buf, output_path_len);
+		strbuf_addstr(&zip_path, "git-diagnostics-");
+		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
+		strbuf_addstr(&zip_path, ".zip");
+
+		if (create_diagnostics_archive(&zip_path, diagnose))
+			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
+
+		strbuf_release(&zip_path);
+	}
+
 	/* Prepare the report contents */
 	get_bug_template(&buffer);
 
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
index 08f5fe9caef..b6d2f591acd 100755
--- a/t/t0091-bugreport.sh
+++ b/t/t0091-bugreport.sh
@@ -78,4 +78,52 @@ test_expect_success 'indicates populated hooks' '
 	test_cmp expect actual
 '
 
+test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose -o report -s test >out &&
+
+	zip_path=report/git-diagnostics-test.zip &&
+	grep "Available space" out &&
+	test_path_is_file "$zip_path" &&
+
+	# Check zipped archive content
+	"$GIT_UNZIP" -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	"$GIT_UNZIP" -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [0-9][0-9]*" out &&
+
+	# Should not include .git directory contents by default
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--diagnose=stats excludes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose=stats -o report -s test >out &&
+
+	# Includes pack quantity/size info
+	"$GIT_UNZIP" -p "$zip_path" packs-local.txt >out &&
+	grep ".git/objects" out &&
+
+	# Does not include .git directory contents
+	! "$GIT_UNZIP" -l "$zip_path" | grep ".git/"
+'
+
+test_expect_success UNZIP '--diagnose=all includes .git dir contents' '
+	test_when_finished rm -rf report &&
+
+	git bugreport --diagnose=all -o report -s test >out &&
+
+	# Includes .git directory contents
+	"$GIT_UNZIP" -l "$zip_path" | grep ".git/" &&
+
+	"$GIT_UNZIP" -p "$zip_path" .git/HEAD >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 10/11] scalar-diagnose: use 'git diagnose --mode=all'
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (8 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  2022-08-12 20:10       ` [PATCH v4 11/11] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Replace implementation of 'scalar diagnose' with an internal invocation of
'git diagnose --mode=all'. This simplifies the implementation of
'cmd_diagnose' by making it a direct alias of 'git diagnose' and removes
some code in 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The
simplicity of the alias also sets up a clean deprecation path for 'scalar
diagnose' (in favor of 'git diagnose'), if that is desired in the future.

This introduces one minor change to the output of 'scalar diagnose', which
is that the prefix of the created zip archive is changed from 'scalar_' to
'git-diagnostics-'.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index d538b8b8f14..68571ce195f 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,7 +11,6 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
-#include "diagnose.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -510,34 +509,19 @@ static int cmd_diagnose(int argc, const char **argv)
 		N_("scalar diagnose [<enlistment>]"),
 		NULL
 	};
-	struct strbuf zip_path = STRBUF_INIT;
-	time_t now = time(NULL);
-	struct tm tm;
+	struct strbuf diagnostics_root = STRBUF_INIT;
 	int res = 0;
 
 	argc = parse_options(argc, argv, NULL, options,
 			     usage, 0);
 
-	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
-
-	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
-	strbuf_addftime(&zip_path,
-			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
-	strbuf_addstr(&zip_path, ".zip");
-	switch (safe_create_leading_directories(zip_path.buf)) {
-	case SCLD_EXISTS:
-	case SCLD_OK:
-		break;
-	default:
-		error_errno(_("could not create directory for '%s'"),
-			    zip_path.buf);
-		goto diagnose_cleanup;
-	}
+	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
+	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
 
-	res = create_diagnostics_archive(&zip_path, DIAGNOSE_ALL);
+	res = run_git("diagnose", "--mode=all", "-s", "%Y%m%d_%H%M%S",
+		      "-o", diagnostics_root.buf, NULL);
 
-diagnose_cleanup:
-	strbuf_release(&zip_path);
+	strbuf_release(&diagnostics_root);
 	return res;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 11/11] scalar: update technical doc roadmap
  2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
                         ` (9 preceding siblings ...)
  2022-08-12 20:10       ` [PATCH v4 10/11] scalar-diagnose: use 'git diagnose --mode=all' Victoria Dye via GitGitGadget
@ 2022-08-12 20:10       ` Victoria Dye via GitGitGadget
  10 siblings, 0 replies; 94+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-12 20:10 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update the Scalar roadmap to reflect the completion of generalizing 'scalar
diagnose' into 'git diagnose' and 'git bugreport --diagnose'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/technical/scalar.txt | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
index 08bc09c225a..f6353375f08 100644
--- a/Documentation/technical/scalar.txt
+++ b/Documentation/technical/scalar.txt
@@ -84,6 +84,9 @@ series have been accepted:
 
 - `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
 
+- `scalar-generalize-diagnose`: Move the functionality of `scalar diagnose`
+  into `git diagnose` and `git bugreport --diagnose`.
+
 Roughly speaking (and subject to change), the following series are needed to
 "finish" this initial version of Scalar:
 
@@ -91,12 +94,6 @@ Roughly speaking (and subject to change), the following series are needed to
   and implement `scalar help`. At the end of this series, Scalar should be
   feature-complete from the perspective of a user.
 
-- Generalize features not specific to Scalar: In the spirit of making Scalar
-  configure only what is needed for large repo performance, move common
-  utilities into other parts of Git. Some of this will be internal-only, but one
-  major change will be generalizing `scalar diagnose` for use with any Git
-  repository.
-
 - Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
   `git`, including updates to build and install it with the rest of Git. This
   change will incorporate Scalar into the Git CI and test framework, as well as
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 05/11] scalar-diagnose: move functionality to common location
  2022-08-12 20:10       ` [PATCH v4 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
@ 2022-08-12 20:26         ` Junio C Hamano
  2022-08-12 21:00           ` Victoria Dye
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-12 20:26 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
> +			    the_repository, NULL, 0);
> +	if (res) {
> +		error(_("failed to write archive"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	fprintf(stderr, "\n"
> +		"Diagnostics complete.\n"
> +		"All of the gathered info is captured in '%s'\n",
> +		zip_path->buf);
> +
> +diagnose_cleanup:
> +	if (archiver_fd >= 0) {
> +		dup2(stdout_fd, STDOUT_FILENO);
> +		close(stdout_fd);
> +		close(archiver_fd);
> +	}

Hmph, after reading 

https://lore.kernel.org/git/32f2cadc-556e-1cd5-a238-c8f1cdaaf470@github.com/

I would have expected to see the above part more like:

                res = write_archive(...);

        diagnose_cleanup:
                if (res)
                        error(...);
                else
                        fprintf(stderr, "Diag complete");

                if (archiver_fd >= 0) {
                        ...


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 06/11] diagnose.c: add option to configure archive contents
  2022-08-12 20:10       ` [PATCH v4 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
@ 2022-08-12 20:31         ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-12 20:31 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -int create_diagnostics_archive(struct strbuf *zip_path)
> +int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode)
>  {
>  	struct strvec archiver_args = STRVEC_INIT;
>  	char **argv_copy = NULL;
>  	int stdout_fd = -1, archiver_fd = -1;
>  	struct strbuf buf = STRBUF_INIT;
> -	int res;
> +	int res, i;
> +	struct archive_dir archive_dirs[] = {
> +		{ ".git", 0 },
> +		{ ".git/hooks", 0 },
> +		{ ".git/info", 0 },
> +		{ ".git/logs", 1 },
> +		{ ".git/objects/info", 0 }
> +	};
> +
> +	if (mode == DIAGNOSE_NONE) {
> +		res = 0;
> +		goto diagnose_cleanup;
> +	}
>  
>  	stdout_fd = dup(STDOUT_FILENO);
>  	if (stdout_fd < 0) {
> @@ -177,12 +194,18 @@ int create_diagnostics_archive(struct strbuf *zip_path)
>  	loose_objs_stats(&buf, ".git/objects");
>  	strvec_push(&archiver_args, buf.buf);
>  
> -	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> -		goto diagnose_cleanup;
> +	/* Only include this if explicitly requested */
> +	if (mode == DIAGNOSE_ALL) {
> +		for (i = 0; i < ARRAY_SIZE(archive_dirs); i++) {
> +			if (add_directory_to_archiver(&archiver_args,
> +						      archive_dirs[i].path,
> +						      archive_dirs[i].recursive)) {
> +				res = error_errno(_("could not add directory '%s' to archiver"),
> +						  archive_dirs[i].path);
> +				goto diagnose_cleanup;
> +			}
> +		}
> +	}

Even without the "only include under DIAGNOSE_ALL" support added by
this step, the table-driven organization is much nicer.  The earlier
"move to common" step aimed to be as close as pure move, so this
step is our first opportunity to make this clean-up, so I do not
mind too much about this step doing two unrelated things (one is to
clean-up the if (A||B||C) into a loop over A, B and C, the other is
to introduce the diagnose_mode) at once.

Thanks.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 05/11] scalar-diagnose: move functionality to common location
  2022-08-12 20:26         ` Junio C Hamano
@ 2022-08-12 21:00           ` Victoria Dye
  2022-08-12 21:20             ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Victoria Dye @ 2022-08-12 21:00 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin,
	Ævar Arnfjörð Bjarmason

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
>> +			    the_repository, NULL, 0);
>> +	if (res) {
>> +		error(_("failed to write archive"));
>> +		goto diagnose_cleanup;
>> +	}
>> +
>> +	fprintf(stderr, "\n"
>> +		"Diagnostics complete.\n"
>> +		"All of the gathered info is captured in '%s'\n",
>> +		zip_path->buf);
>> +
>> +diagnose_cleanup:
>> +	if (archiver_fd >= 0) {
>> +		dup2(stdout_fd, STDOUT_FILENO);
>> +		close(stdout_fd);
>> +		close(archiver_fd);
>> +	}
> 
> Hmph, after reading 
> 
> https://lore.kernel.org/git/32f2cadc-556e-1cd5-a238-c8f1cdaaf470@github.com/
> 
> I would have expected to see the above part more like:
> 
>                 res = write_archive(...);
> 
>         diagnose_cleanup:
>                 if (res)
>                         error(...);
>                 else
>                         fprintf(stderr, "Diag complete");
> 
>                 if (archiver_fd >= 0) {
>                         ...
> 

I originally planned to implement it this way, but instead went with adding
an error printout explicitly for failed 'add_directory_to_archiver()' calls
(in patch 6 [1]). I elaborated on the thought process/reasoning for
modifying the approach in the cover letter [2]:

> Improved error reporting in 'create_diagnostics_archive()'. I was
> originally going to modify the "failed to write archive" error to trigger
> whenever 'create_diagnostics_archive()' returned a nonzero value.
> However, while working on it I realized the message would no longer be
> tied to a failure of 'write_archive()', making it less helpful in
> pinpointing an issue. To address the original issue
> ('add_directory_to_archiver()' silently failing in
> 'create_diagnostics_archive()'), I instead refactored those calls into a
> loop and added the error message. Now, there's exactly one error message
> printed for each possible early exit scenario from
> 'create_diagnostics_archive()', hopefully avoiding both redundancy &
> under-reporting.

To add a bit more context: when I used the "move 'diagnose_cleanup'"
approach, I felt that the added message was either redundant or too general
to help a user identify an issue. Redundancy appeared when, for example,
'dup2()' returned a nonzero code; if that happened, we'd get the printouts:

$ git diagnose --suffix test
error: could not redirect output: <ERRNO error message>
error: could not write archive
error: unable to create diagnostics archive 'git-diagnostics-test.zip': <ERRNO error message>

The first two are from 'create_diagnostics_archive()', and the second
doesn't really give us information that we don't get out of the third (in
'builtin/diagnose.c').

Conversely, a failure in 'add_directory_to_archiver()' and 'write_archive()'
would give us the same printouts (at least within the scope of
'diagnose.c'/'builtin/diagnose.c'):

$ git diagnose --suffix test
error: could not write archive
error: unable to create diagnostics archive 'git-diagnostics-test.zip: <some error message>

Previously, "could not write archive" would point someone debugging to
'write_archive()'; now, it's unclear.

I ended up settling on adding the error message directly to the
'add_directory_to_archiver()' loop in patch 6 because it meant that:

1. 'create_diagnostics_archive()' would only ever print one error message;
   the others printed would indicate (IMO more clearly) where in the call
   stack the error happened
2. there would be a unique error message for each condition that caused
   'create_diagnostics_archive()' to exit early

Apologies for not sending another reply with these details before
re-rolling. I'll be more direct when changing plans in the future.

Thanks!

[1] https://lore.kernel.org/git/710b67e5776363d199ed5043d019386819d44e7e.1660335019.git.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.1310.v4.git.1660335019.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 05/11] scalar-diagnose: move functionality to common location
  2022-08-12 21:00           ` Victoria Dye
@ 2022-08-12 21:20             ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-12 21:20 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	johannes.schindelin, Ævar Arnfjörð Bjarmason

Victoria Dye <vdye@github.com> writes:

>> Improved error reporting in 'create_diagnostics_archive()'. I was
>> originally going to modify the "failed to write archive" error to trigger
>> whenever 'create_diagnostics_archive()' returned a nonzero value.
>> However, while working on it I realized the message would no longer be
>> tied to a failure of 'write_archive()', making it less helpful in
>> pinpointing an issue. To address the original issue
>> ('add_directory_to_archiver()' silently failing in
>> 'create_diagnostics_archive()'), I instead refactored those calls into a
>> loop and added the error message. Now, there's exactly one error message
>> printed for each possible early exit scenario from
>> 'create_diagnostics_archive()', hopefully avoiding both redundancy &
>> under-reporting.

Ah, I see.  I probably should have read the cover letter before
responding.  I try to understand the new iteration _without_ relying
on the cover letter first, to ensure that the resulting history is
still understandable; when I see something questionable, however, I
should see if cover letter gives more context and clues.  Sorry for
the noise.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-12 20:10       ` [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
@ 2022-08-18 18:43         ` Ævar Arnfjörð Bjarmason
  2022-08-18 19:12           ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-18 18:43 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, johannes.schindelin, Victoria Dye


On Fri, Aug 12 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]

This is correct:

> +'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]

...

> +	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),

But this is not, it's missing () around the short v.s. long option, and
we should have a space surrounding the "|" as well.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin
  2022-08-18 18:43         ` Ævar Arnfjörð Bjarmason
@ 2022-08-18 19:12           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-18 19:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	johannes.schindelin, Victoria Dye

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Aug 12 2022, Victoria Dye via GitGitGadget wrote:
>
>> From: Victoria Dye <vdye@github.com>
>> [...]
>
> This is correct:
>
>> +'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
>
> ...
>
>> +	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
>
> But this is not, it's missing () around the short v.s. long option, and
> we should have a space surrounding the "|" as well.

You are commenting on what appears inside N_() and I agree that it
should match the other one.

It is kind of sad that our usage strings do not allow proper
translation and instead force translaters to _know_ that they are
supposed to touch _only_ the placeholder strings (path and format in
this case).

Not your fault, or Victoria's, of course ;-)

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2022-08-18 19:12 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-01 21:14 [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Victoria Dye via GitGitGadget
2022-08-01 21:14 ` [PATCH 1/7] scalar: use "$GIT_UNZIP" in 'scalar diagnose' test Victoria Dye via GitGitGadget
2022-08-01 21:46   ` Junio C Hamano
2022-08-01 21:14 ` [PATCH 2/7] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
2022-08-01 22:16   ` Junio C Hamano
2022-08-02 15:40     ` Victoria Dye
2022-08-02  2:17   ` Ævar Arnfjörð Bjarmason
2022-08-01 21:14 ` [PATCH 3/7] builtin/bugreport.c: avoid size_t overflow Victoria Dye via GitGitGadget
2022-08-01 22:18   ` Junio C Hamano
2022-08-02 16:26     ` Victoria Dye
2022-08-02 20:51       ` Junio C Hamano
2022-08-02  2:03   ` Ævar Arnfjörð Bjarmason
2022-08-02 16:26     ` Victoria Dye
2022-08-03 12:25       ` Ævar Arnfjörð Bjarmason
2022-08-01 21:14 ` [PATCH 4/7] builtin/bugreport.c: add directory to archiver more gently Victoria Dye via GitGitGadget
2022-08-01 22:22   ` Junio C Hamano
2022-08-02 15:43     ` Victoria Dye
2022-08-01 21:14 ` [PATCH 5/7] builtin/bugreport.c: add '--no-report' option Victoria Dye via GitGitGadget
2022-08-01 22:31   ` Junio C Hamano
2022-08-02 19:46     ` Victoria Dye
2022-08-01 21:14 ` [PATCH 6/7] scalar: use 'git bugreport --diagnose' in 'scalar diagnose' Victoria Dye via GitGitGadget
2022-08-01 21:14 ` [PATCH 7/7] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
2022-08-01 21:34 ` [PATCH 0/7] Generalize 'scalar diagnose' into 'git bugreport --diagnose' Junio C Hamano
2022-08-02  2:49 ` Ævar Arnfjörð Bjarmason
2022-08-02 19:48   ` Victoria Dye
2022-08-03 12:34     ` Ævar Arnfjörð Bjarmason
2022-08-04  1:45 ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and " Victoria Dye via GitGitGadget
2022-08-04  1:45   ` [PATCH v2 01/10] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
2022-08-04  1:45   ` [PATCH v2 02/10] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
2022-08-04  1:45   ` [PATCH v2 03/10] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
2022-08-04  6:19     ` Ævar Arnfjörð Bjarmason
2022-08-04 17:12       ` Junio C Hamano
2022-08-04 20:12         ` Ævar Arnfjörð Bjarmason
2022-08-04 21:09           ` Junio C Hamano
2022-08-04  1:45   ` [PATCH v2 04/10] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
2022-08-04  1:45   ` [PATCH v2 05/10] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
2022-08-04  6:24     ` Ævar Arnfjörð Bjarmason
2022-08-04  1:45   ` [PATCH v2 06/10] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
2022-08-04  6:27     ` Ævar Arnfjörð Bjarmason
2022-08-05 19:38       ` Derrick Stolee
2022-08-11 11:06         ` Ævar Arnfjörð Bjarmason
2022-08-05 19:11     ` Derrick Stolee
2022-08-04  1:45   ` [PATCH v2 07/10] builtin/diagnose.c: gate certain data behind '--all' Victoria Dye via GitGitGadget
2022-08-04  6:39     ` Ævar Arnfjörð Bjarmason
2022-08-04  1:45   ` [PATCH v2 08/10] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
2022-08-05 19:35     ` Derrick Stolee
2022-08-09 23:53       ` Victoria Dye
2022-08-10 12:52         ` Derrick Stolee
2022-08-10 16:13           ` Victoria Dye
2022-08-10 16:47             ` Derrick Stolee
2022-08-04  1:45   ` [PATCH v2 09/10] scalar-diagnose: use 'git diagnose --all' Victoria Dye via GitGitGadget
2022-08-04  6:54     ` Ævar Arnfjörð Bjarmason
2022-08-09 16:54       ` Victoria Dye
2022-08-04  1:45   ` [PATCH v2 10/10] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
2022-08-04 17:22   ` [PATCH v2 00/10] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Junio C Hamano
2022-08-09 16:17     ` Victoria Dye
2022-08-09 16:50       ` Junio C Hamano
2022-08-10 23:34   ` [PATCH v3 00/11] " Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 02/11] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 03/11] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
2022-08-11  0:16       ` Junio C Hamano
2022-08-12 17:00         ` Victoria Dye
2022-08-11 10:51       ` Ævar Arnfjörð Bjarmason
2022-08-11 15:43         ` Victoria Dye
2022-08-10 23:34     ` [PATCH v3 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 08/11] builtin/diagnose.c: add '--mode' option Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
2022-08-11 10:53       ` Ævar Arnfjörð Bjarmason
2022-08-11 15:40         ` Victoria Dye
2022-08-11 20:30           ` Ævar Arnfjörð Bjarmason
2022-08-10 23:34     ` [PATCH v3 10/11] scalar-diagnose: use 'git diagnose --mode=all' Victoria Dye via GitGitGadget
2022-08-10 23:34     ` [PATCH v3 11/11] scalar: update technical doc roadmap Victoria Dye via GitGitGadget
2022-08-12 20:10     ` [PATCH v4 00/11] Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 01/11] scalar-diagnose: use "$GIT_UNZIP" in test Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 02/11] scalar-diagnose: avoid 32-bit overflow of size_t Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 03/11] scalar-diagnose: add directory to archiver more gently Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 04/11] scalar-diagnose: move 'get_disk_info()' to 'compat/' Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 05/11] scalar-diagnose: move functionality to common location Victoria Dye via GitGitGadget
2022-08-12 20:26         ` Junio C Hamano
2022-08-12 21:00           ` Victoria Dye
2022-08-12 21:20             ` Junio C Hamano
2022-08-12 20:10       ` [PATCH v4 06/11] diagnose.c: add option to configure archive contents Victoria Dye via GitGitGadget
2022-08-12 20:31         ` Junio C Hamano
2022-08-12 20:10       ` [PATCH v4 07/11] builtin/diagnose.c: create 'git diagnose' builtin Victoria Dye via GitGitGadget
2022-08-18 18:43         ` Ævar Arnfjörð Bjarmason
2022-08-18 19:12           ` Junio C Hamano
2022-08-12 20:10       ` [PATCH v4 08/11] builtin/diagnose.c: add '--mode' option Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 09/11] builtin/bugreport.c: create '--diagnose' option Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 10/11] scalar-diagnose: use 'git diagnose --mode=all' Victoria Dye via GitGitGadget
2022-08-12 20:10       ` [PATCH v4 11/11] scalar: update technical doc roadmap Victoria Dye via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.