git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bugreport: add tool to generate debugging info
@ 2019-08-15  2:34 Emily Shaffer
  2019-08-15 14:15 ` Derrick Stolee
                   ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-15  2:34 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Make it easier for users who encounter a bug to send a report by
collecting some state information, intended to be viewed by humans
familiar with Git.

Teach Git how to prompt the user for a good bug report: reproduction
steps, expected behavior, and actual behavior. Also, teach Git to write
down its own version, the version of some of its dependencies, the
operating system information, the effective gitconfig, the configured
hooks, and the contents of $GIT_DIR/logs. Finally, make sure Git asks
the user to review the contents of the report after it's generated.

If users can send us a well-written bug report complete with system
information, a remote we may be able to clone, and a reflog showing us
where the problem occurred, we can reduce the number of questions and
answers which must travel between the reporter and the Git contributor.

Users may also wish to send a report like this to their local "Git
expert" if they have put their repository into a state they are confused
by.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
There are some things I'm not certain about from this patch, I'd
appreciate feedback.

 - Do we want to advertise the Git mailing list for bug reports?
 - Which config options should we filter to avoid accidentally receiving
   credentials?
 - At the moment, the test is trying to check "everything we thought we
   would populate got populated" - that seemed to me like it would hold
   up best to changes to the set of information being collected. But
   maybe there's something more robust we can do.

And of course, if there are suggestions for other things to include in
the report I'm interested in adding.

An example of the output can be found here:
https://gist.github.com/nasamuffin/2e320892f5c2147e829cbcf5bd0759a2

Thanks.
 - Emily

 .gitignore                      |  1 +
 Documentation/git-bugreport.txt | 48 ++++++++++++++++++
 Makefile                        |  1 +
 command-list.txt                |  1 +
 git-bugreport.sh                | 86 +++++++++++++++++++++++++++++++++
 t/t0091-bugreport.sh            | 41 ++++++++++++++++
 6 files changed, 178 insertions(+)
 create mode 100644 Documentation/git-bugreport.txt
 create mode 100755 git-bugreport.sh
 create mode 100755 t/t0091-bugreport.sh

diff --git a/.gitignore b/.gitignore
index 521d8f4fb4..b4f5433084 100644
--- a/.gitignore
+++ b/.gitignore
@@ -25,6 +25,7 @@
 /git-bisect--helper
 /git-blame
 /git-branch
+/git-bugreport
 /git-bundle
 /git-cat-file
 /git-check-attr
diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
new file mode 100644
index 0000000000..c5f45bbee8
--- /dev/null
+++ b/Documentation/git-bugreport.txt
@@ -0,0 +1,48 @@
+git-bugreport(1)
+================
+
+NAME
+----
+git-bugreport - Collect information for user to file a bug report
+
+SYNOPSIS
+--------
+[verse]
+'git bugreport' [-o | --output <path>]
+
+DESCRIPTION
+-----------
+Captures information about the user's machine, Git client, and repository state,
+as well as a form requesting information about the behavior the user observed,
+into a single text file which the user can then share, for example to the Git
+mailing list, in order to report an observed bug.
+
+The following information is requested from the user:
+
+ - Reproduction steps
+ - Expected behavior
+ - Actual behavior
+
+The following information is captured automatically:
+
+ - Git version (`git version --build-options`)
+ - Machine information (`uname -a`)
+ - Versions of various dependencies
+ - Git config contents (`git config --show-origin --list`)
+ - The contents of all configured git-hooks in `.git/hooks/`
+ - The contents of `.git/logs`
+
+OPTIONS
+-------
+-o [<path>]::
+--output [<path>]::
+	Place the resulting bug report file in <path> instead of the root of the
+	Git repository.
+
+NOTE
+----
+Bug reports can be sent to git@vger.kernel.org.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index f9255344ae..69801a1c45 100644
--- a/Makefile
+++ b/Makefile
@@ -606,6 +606,7 @@ TEST_PROGRAMS_NEED_X =
 unexport CDPATH
 
 SCRIPT_SH += git-bisect.sh
+SCRIPT_SH += git-bugreport.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
diff --git a/command-list.txt b/command-list.txt
index a9ac72bef4..be5a605047 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -54,6 +54,7 @@ git-archive                             mainporcelain
 git-bisect                              mainporcelain           info
 git-blame                               ancillaryinterrogators          complete
 git-branch                              mainporcelain           history
+git-bugreport                           ancillaryinterrogators
 git-bundle                              mainporcelain
 git-cat-file                            plumbinginterrogators
 git-check-attr                          purehelpers
diff --git a/git-bugreport.sh b/git-bugreport.sh
new file mode 100755
index 0000000000..2200703a51
--- /dev/null
+++ b/git-bugreport.sh
@@ -0,0 +1,86 @@
+#!/bin/sh
+
+print_filenames_and_content() {
+while read -r line; do
+	echo "$line";
+	echo "========";
+	cat "$line";
+	echo;
+done
+}
+
+generate_report_text() {
+
+	# Generate a form for the user to fill out with their issue.
+	gettextln "Thank you for filling out a Git bug report!"
+	gettextln "Please answer the following questions to help us understand your issue."
+	echo
+	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
+	echo
+	gettextln "What did you expect to happen? (Expected behavior)"
+	echo
+	gettextln "What happened instead? (Actual behavior)"
+	echo
+	gettextln "Anything else you want to add:"
+	echo
+	gettextln "Please review the rest of the bug report below."
+	gettextln "You can delete any lines you don't wish to send."
+	echo
+
+	echo "[System Information]"
+	git version --build-options
+	uname -a
+	curl-config --version
+	ldd --version
+	echo
+
+	echo "[Git Config]"
+	# TODO: Pass this through grep -v to avoid users sending us their credentials.
+	git config --show-origin --list
+	echo
+
+	echo "[Configured Hooks]"
+	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
+	echo
+
+	echo "[Git Logs]"
+	find "$GIT_DIR/logs" -type f | print_filenames_and_content
+	echo
+
+}
+
+USAGE="[-o | --output <path>]"
+
+SUBDIRECTORY_OK=t
+OPTIONS_SPEC=
+. git-sh-setup
+. git-sh-i18n
+
+basedir="$PWD"
+while :
+do
+	case "$1" in
+	-o|--output)
+		shift
+		basedir="$1"
+		shift
+		continue
+		;;
+	"")
+		break
+		;;
+	*)
+		usage
+		;;
+	esac
+done
+
+
+# Create bugreport file
+BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"
+
+generate_report_text >$BUGREPORT_FILE
+
+git_editor $BUGREPORT_FILE
+
+eval_gettextln "Your new bug report is in \$BUGREPORT_FILE."
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
new file mode 100755
index 0000000000..6eb2ee4f66
--- /dev/null
+++ b/t/t0091-bugreport.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+test_description='git bugreport'
+
+. ./test-lib.sh
+
+# Headers "[System Info]" will be followed by a non-empty line if we put some
+# information there; we can make sure all our headers were followed by some
+# information to check if the command was successful.
+HEADER_PATTERN="^\[.*\]$"
+check_all_headers_populated() {
+	while read -r line; do
+		if [$(grep $HEADER_PATTERN $line)]; then
+			read -r nextline
+			if [-z $nextline]; then
+				return 1;
+			fi
+		fi
+	done
+}
+
+test_expect_success 'creates a report with content in the right places' '
+	git bugreport &&
+	check_all_headers_populated <git-bugreport-* &&
+	rm git-bugreport-*
+'
+
+test_expect_success '--output puts the report in the provided dir' '
+	mkdir foo/ &&
+	git bugreport -o foo/ &&
+	test -f foo/git-bugreport-* &&
+	rm -fr foo/
+'
+
+test_expect_success 'incorrect arguments abort with usage' '
+	test_must_fail git bugreport --false 2>output &&
+	grep usage output &&
+	test ! -f git-bugreport-*
+'
+
+test_done
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15  2:34 [PATCH] bugreport: add tool to generate debugging info Emily Shaffer
@ 2019-08-15 14:15 ` Derrick Stolee
  2019-08-15 14:36   ` Junio C Hamano
                     ` (2 more replies)
  2019-08-15 18:10 ` Junio C Hamano
  2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
  2 siblings, 3 replies; 68+ messages in thread
From: Derrick Stolee @ 2019-08-15 14:15 UTC (permalink / raw)
  To: Emily Shaffer, git

On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> Make it easier for users who encounter a bug to send a report by
> collecting some state information, intended to be viewed by humans
> familiar with Git.

This is an excellent idea! VFS for Git has a similar "diagnose" command
that collects logs, config, and other details (like packfile sizes and
loose object counts). That has been a critical tool for supporting users.

> Teach Git how to prompt the user for a good bug report: reproduction
> steps, expected behavior, and actual behavior. Also, teach Git to write
> down its own version, the version of some of its dependencies, the
> operating system information, the effective gitconfig, the configured
> hooks, and the contents of $GIT_DIR/logs. Finally, make sure Git asks
> the user to review the contents of the report after it's generated.
> 
> If users can send us a well-written bug report complete with system
> information, a remote we may be able to clone, and a reflog showing us
> where the problem occurred, we can reduce the number of questions and
> answers which must travel between the reporter and the Git contributor.
> 
> Users may also wish to send a report like this to their local "Git
> expert" if they have put their repository into a state they are confused
> by.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> There are some things I'm not certain about from this patch, I'd
> appreciate feedback.
> 
>  - Do we want to advertise the Git mailing list for bug reports?

That is possible. Isn't there another mailing list for git users?

I could see a patch added on top of this for git-for-windows/git that
changes the instructions to create issues on GitHub.

>  - Which config options should we filter to avoid accidentally receiving
>    credentials?

The remote URLs are pretty sensitive. Not only do users sometimes put passwords
or PATs into their URLs, the literal name of the repo could be a secret.

>  - At the moment, the test is trying to check "everything we thought we
>    would populate got populated" - that seemed to me like it would hold
>    up best to changes to the set of information being collected. But
>    maybe there's something more robust we can do.
> 
> And of course, if there are suggestions for other things to include in
> the report I'm interested in adding.
> 
> An example of the output can be found here:
> https://gist.github.com/nasamuffin/2e320892f5c2147e829cbcf5bd0759a2
> 
> Thanks.
>  - Emily
> 
>  .gitignore                      |  1 +
>  Documentation/git-bugreport.txt | 48 ++++++++++++++++++
>  Makefile                        |  1 +
>  command-list.txt                |  1 +
>  git-bugreport.sh                | 86 +++++++++++++++++++++++++++++++++
>  t/t0091-bugreport.sh            | 41 ++++++++++++++++
>  6 files changed, 178 insertions(+)
>  create mode 100644 Documentation/git-bugreport.txt
>  create mode 100755 git-bugreport.sh
>  create mode 100755 t/t0091-bugreport.sh
> 
> diff --git a/.gitignore b/.gitignore
> index 521d8f4fb4..b4f5433084 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -25,6 +25,7 @@
>  /git-bisect--helper
>  /git-blame
>  /git-branch
> +/git-bugreport
>  /git-bundle
>  /git-cat-file
>  /git-check-attr
> diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
> new file mode 100644
> index 0000000000..c5f45bbee8
> --- /dev/null
> +++ b/Documentation/git-bugreport.txt
> @@ -0,0 +1,48 @@
> +git-bugreport(1)
> +================
> +
> +NAME
> +----
> +git-bugreport - Collect information for user to file a bug report
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git bugreport' [-o | --output <path>]
> +
> +DESCRIPTION
> +-----------
> +Captures information about the user's machine, Git client, and repository state,
> +as well as a form requesting information about the behavior the user observed,
> +into a single text file which the user can then share, for example to the Git
> +mailing list, in order to report an observed bug.
> +
> +The following information is requested from the user:
> +
> + - Reproduction steps
> + - Expected behavior
> + - Actual behavior
> +
> +The following information is captured automatically:
> +
> + - Git version (`git version --build-options`)
> + - Machine information (`uname -a`)
> + - Versions of various dependencies
> + - Git config contents (`git config --show-origin --list`)
> + - The contents of all configured git-hooks in `.git/hooks/`
> + - The contents of `.git/logs`
> +
> +OPTIONS
> +-------
> +-o [<path>]::
> +--output [<path>]::
> +	Place the resulting bug report file in <path> instead of the root of the
> +	Git repository.
> +
> +NOTE
> +----
> +Bug reports can be sent to git@vger.kernel.org.
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> diff --git a/Makefile b/Makefile
> index f9255344ae..69801a1c45 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -606,6 +606,7 @@ TEST_PROGRAMS_NEED_X =
>  unexport CDPATH
>  
>  SCRIPT_SH += git-bisect.sh
> +SCRIPT_SH += git-bugreport.sh
>  SCRIPT_SH += git-difftool--helper.sh
>  SCRIPT_SH += git-filter-branch.sh
>  SCRIPT_SH += git-merge-octopus.sh
> diff --git a/command-list.txt b/command-list.txt
> index a9ac72bef4..be5a605047 100644
> --- a/command-list.txt
> +++ b/command-list.txt
> @@ -54,6 +54,7 @@ git-archive                             mainporcelain
>  git-bisect                              mainporcelain           info
>  git-blame                               ancillaryinterrogators          complete
>  git-branch                              mainporcelain           history
> +git-bugreport                           ancillaryinterrogators
>  git-bundle                              mainporcelain
>  git-cat-file                            plumbinginterrogators
>  git-check-attr                          purehelpers
> diff --git a/git-bugreport.sh b/git-bugreport.sh
> new file mode 100755
> index 0000000000..2200703a51
> --- /dev/null
> +++ b/git-bugreport.sh

At first I was alarmed by "What? another shell script?" but this command should
prioritize flexibility and extensibility over speed. Running multiple processes
shouldn't be too taxing for what we are trying to do here.

> @@ -0,0 +1,86 @@
> +#!/bin/sh
> +
> +print_filenames_and_content() {
> +while read -r line; do
> +	echo "$line";
> +	echo "========";
> +	cat "$line";
> +	echo;
> +done
> +}
> +
> +generate_report_text() {
> +
> +	# Generate a form for the user to fill out with their issue.
> +	gettextln "Thank you for filling out a Git bug report!"
> +	gettextln "Please answer the following questions to help us understand your issue."
> +	echo
> +	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
> +	echo
> +	gettextln "What did you expect to happen? (Expected behavior)"
> +	echo
> +	gettextln "What happened instead? (Actual behavior)"
> +	echo
> +	gettextln "Anything else you want to add:"
> +	echo
> +	gettextln "Please review the rest of the bug report below."
> +	gettextln "You can delete any lines you don't wish to send."
> +	echo
> +
> +	echo "[System Information]"
> +	git version --build-options
> +	uname -a
> +	curl-config --version
> +	ldd --version
> +	echo
> +
> +	echo "[Git Config]"
> +	# TODO: Pass this through grep -v to avoid users sending us their credentials.
> +	git config --show-origin --list
> +	echo

Config options to consider stripping out:

	*url*
	*pass* (anything "password" but also "sendmail.smtppass")

> +	echo "[Configured Hooks]"
> +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
> +	echo

Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
is a binary file?

> +
> +	echo "[Git Logs]"
> +	find "$GIT_DIR/logs" -type f | print_filenames_and_content
> +	echo

As mentioned before, I've sometimes found it helpful to know the data shape for the object
store. Having a few extra steps such as the following could be nice:

	echo "[Loose Objects]"
	for objdir in $(find "$GIT_DIR/objects/??" -type d)
	do
		echo "$objdir: $(ls $objdir | wc -l)"
	done
	echo

	echo "[Pack Data]"
	ls -l "$GIT_DIR/objects/pack"
	echo

	echo "[Object Info Data]"
	ls -lR "$GIT_DIR/objects/info"
	echo

	echo "[Alternates File]"
	echo "========"
	cat "$GIT_DIR/objects/info/alternates"
	echo

That last one will collect information on the commit-graph file, even if it is an
incremental file.

> +
> +}
> +
> +USAGE="[-o | --output <path>]"
> +
> +SUBDIRECTORY_OK=t
> +OPTIONS_SPEC=
> +. git-sh-setup
> +. git-sh-i18n
> +
> +basedir="$PWD"
> +while :
> +do
> +	case "$1" in
> +	-o|--output)
> +		shift
> +		basedir="$1"
> +		shift
> +		continue
> +		;;
> +	"")
> +		break
> +		;;
> +	*)
> +		usage
> +		;;
> +	esac
> +done
> +
> +
> +# Create bugreport file
> +BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"
> +
> +generate_report_text >$BUGREPORT_FILE
> +
> +git_editor $BUGREPORT_FILE
> +
> +eval_gettextln "Your new bug report is in \$BUGREPORT_FILE."
> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> new file mode 100755
> index 0000000000..6eb2ee4f66
> --- /dev/null
> +++ b/t/t0091-bugreport.sh
> @@ -0,0 +1,41 @@
> +#!/bin/bash
> +
> +test_description='git bugreport'
> +
> +. ./test-lib.sh
> +
> +# Headers "[System Info]" will be followed by a non-empty line if we put some
> +# information there; we can make sure all our headers were followed by some
> +# information to check if the command was successful.
> +HEADER_PATTERN="^\[.*\]$"
> +check_all_headers_populated() {
> +	while read -r line; do
> +		if [$(grep $HEADER_PATTERN $line)]; then
> +			read -r nextline
> +			if [-z $nextline]; then
> +				return 1;
> +			fi
> +		fi
> +	done
> +}
> +
> +test_expect_success 'creates a report with content in the right places' '
> +	git bugreport &&
> +	check_all_headers_populated <git-bugreport-* &&
> +	rm git-bugreport-*
> +'
> +
> +test_expect_success '--output puts the report in the provided dir' '
> +	mkdir foo/ &&
> +	git bugreport -o foo/ &&
> +	test -f foo/git-bugreport-* &&
> +	rm -fr foo/
> +'
> +
> +test_expect_success 'incorrect arguments abort with usage' '
> +	test_must_fail git bugreport --false 2>output &&
> +	grep usage output &&
> +	test ! -f git-bugreport-*
> +'
> +
> +test_done
> 

I think this is a great start, and I'll take some time later to try it out.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 14:15 ` Derrick Stolee
@ 2019-08-15 14:36   ` Junio C Hamano
  2019-08-15 22:52     ` Emily Shaffer
  2019-08-15 20:07   ` Johannes Schindelin
  2019-08-15 20:13   ` Emily Shaffer
  2 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-08-15 14:36 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Emily Shaffer, git

Derrick Stolee <stolee@gmail.com> writes:

> Config options to consider stripping out:
>
> 	*url*
> 	*pass* (anything "password" but also "sendmail.smtppass")

Blacklisting?  I wonder if users feel safer if these are limited to
known-benign ones.

>> +	echo "[Configured Hooks]"
>> +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
>> +	echo
>
> Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
> is a binary file?

This makes me feel very nervous.  $GIT_DIR/hooks/ are private and
people can hardcode credentials in them; $GIT_DIR/hooks/pre-foo may
be written toread from $GIT_DIR/hooks/mypassword with the knowledge
that there won't be any "mypassword" hook.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15  2:34 [PATCH] bugreport: add tool to generate debugging info Emily Shaffer
  2019-08-15 14:15 ` Derrick Stolee
@ 2019-08-15 18:10 ` Junio C Hamano
  2019-08-15 21:52   ` Emily Shaffer
  2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
  2 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-08-15 18:10 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
> new file mode 100644
> index 0000000000..c5f45bbee8
> --- /dev/null
> +++ b/Documentation/git-bugreport.txt
> @@ -0,0 +1,48 @@
> +git-bugreport(1)
> +================
> +
> +NAME
> +----
> +git-bugreport - Collect information for user to file a bug report
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git bugreport' [-o | --output <path>]
> +
> +DESCRIPTION
> +-----------
> +Captures information about the user's machine, Git client, and repository state,
> +as well as a form requesting information about the behavior the user observed,
> +into a single text file which the user can then share, for example to the Git
> +mailing list, in order to report an observed bug.
> +
> +The following information is requested from the user:
> +
> + - Reproduction steps
> + - Expected behavior
> + - Actual behavior

+ - How the above two are different

It is often helpful to have users explain how the expected and
actual are different in their own words.

> +NOTE
> +----
> +Bug reports can be sent to git@vger.kernel.org.

I am not sure if this belongs here.

> diff --git a/git-bugreport.sh b/git-bugreport.sh
> new file mode 100755
> index 0000000000..2200703a51
> --- /dev/null
> +++ b/git-bugreport.sh
> @@ -0,0 +1,86 @@
> +#!/bin/sh
> +
> +print_filenames_and_content() {
> +while read -r line; do
> +	echo "$line";
> +	echo "========";
> +	cat "$line";
> +	echo;
> +done
> +}

Style.

 - have SP on both sides of ()
 - one more HT indent for the function body
 - "do" on its own line
 - no unnecessary semicolons when LF would do

You probably are better off asking xargs to do this instead of
relying on "read -r".

	find ... | xargs -n 1 sh -c 'echo "$1" && cat "$1"' -

or something like that, perhaps.

> +
> +generate_report_text() {
> +
> +	# Generate a form for the user to fill out with their issue.
> +	gettextln "Thank you for filling out a Git bug report!"
> +	gettextln "Please answer the following questions to help us understand your issue."
> +	echo
> +	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
> +	echo
> +	gettextln "What did you expect to happen? (Expected behavior)"
> +	echo
> +	gettextln "What happened instead? (Actual behavior)"
> +	echo
> +	gettextln "Anything else you want to add:"
> +	echo
> +	gettextln "Please review the rest of the bug report below."
> +	gettextln "You can delete any lines you don't wish to send."
> +	echo

Would we on the receiving end be able to tell these section headers
in translated to 47 different languages?  I am sure that i18n is
used here to encourage non-C-locale users to file bugs in their own
languages, but are we prepared to react to them?

> +	echo "[System Information]"
> +	git version --build-options
> +	uname -a
> +	curl-config --version
> +	ldd --version

"curl-config: command not found" may be clear enough, but would
there be a case where errors from two or more consecutive commands
in the above sequence would make the output confusing to the person
sitting on the receiving end?  Would it help, as a convention, to
always ahve "echo [what am I reporting]" before each of these
commands?

> +# Create bugreport file
> +BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"

How portable is -Iminutes?

> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> new file mode 100755
> index 0000000000..6eb2ee4f66
> --- /dev/null
> +++ b/t/t0091-bugreport.sh
> @@ -0,0 +1,41 @@
> +#!/bin/bash

Make sure /bin/sh suffices to run the test script.

> +test_description='git bugreport'
> +
> +. ./test-lib.sh
> +
> +# Headers "[System Info]" will be followed by a non-empty line if we put some
> +# information there; we can make sure all our headers were followed by some
> +# information to check if the command was successful.
> +HEADER_PATTERN="^\[.*\]$"
> +check_all_headers_populated() {
> +	while read -r line; do
> +		if [$(grep $HEADER_PATTERN $line)]; then

Documentation/CodingGuidelines

I am not sure if the traits this test script checks about the
contents of the output is all that interesting.  Whenever we add new
sections to the main command because we want other kinds of
information collected, we'd update this test script because
otherwise the test would fail, but would it result in quality
bugreport tool, or is it just additional busywork?

If we decide later that a header and its body needs to be separated
with a blank line for better readablity, the check done here would
also need to be updated, but again, that does not feel like anything
more than just busywork to me.

The tests for "-o" and unknown options do make sense, though.

Thanks.

> +# Headers "[System Info]" will be followed by a non-empty line if we put some
> +# information there; we can make sure all our headers were followed by some
> +# information to check if the command was successful.
> +HEADER_PATTERN="^\[.*\]$"
> +check_all_headers_populated() {
> +	while read -r line; do
> +		if [$(grep $HEADER_PATTERN $line)]; then
> +			read -r nextline
> +			if [-z $nextline]; then
> +				return 1;
> +			fi
> +		fi
> +	done
> +}
> +
> +test_expect_success 'creates a report with content in the right places' '
> +	git bugreport &&
> +	check_all_headers_populated <git-bugreport-* &&
> +	rm git-bugreport-*
> +'
> +
> +test_expect_success '--output puts the report in the provided dir' '
> +	mkdir foo/ &&
> +	git bugreport -o foo/ &&
> +	test -f foo/git-bugreport-* &&
> +	rm -fr foo/
> +'
> +
> +test_expect_success 'incorrect arguments abort with usage' '
> +	test_must_fail git bugreport --false 2>output &&
> +	grep usage output &&
> +	test ! -f git-bugreport-*
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 14:15 ` Derrick Stolee
  2019-08-15 14:36   ` Junio C Hamano
@ 2019-08-15 20:07   ` Johannes Schindelin
  2019-08-15 22:24     ` Emily Shaffer
  2019-08-15 20:13   ` Emily Shaffer
  2 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-08-15 20:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Emily Shaffer, git

Hi,

On Thu, 15 Aug 2019, Derrick Stolee wrote:

> On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > new file mode 100755
> > index 0000000000..2200703a51
> > --- /dev/null
> > +++ b/git-bugreport.sh
>
> At first I was alarmed by "What? another shell script?" but this
> command should prioritize flexibility and extensibility over speed.
> Running multiple processes shouldn't be too taxing for what we are
> trying to do here.

Git for Windows sometimes receives bug reports about Bash not being able
to start (usually it is a DLL base address problem, related to the way
Cygwin and MSYS2 emulate `fork()`).

In such a case, `git bugreport` would only be able to offer a reason for
yet another bug report instead of adding useful metadata.

Something to keep in mind when deciding how to implement this command.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 14:15 ` Derrick Stolee
  2019-08-15 14:36   ` Junio C Hamano
  2019-08-15 20:07   ` Johannes Schindelin
@ 2019-08-15 20:13   ` Emily Shaffer
  2 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-15 20:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Thu, Aug 15, 2019 at 10:15:24AM -0400, Derrick Stolee wrote:

> >  - Do we want to advertise the Git mailing list for bug reports?
> 
> That is possible. Isn't there another mailing list for git users?

I know there's an IRC channel for Git users, I dunno about mailing list.
I'm worried that places I name as points of contact will grow stale,
although I suppose #git on freenode isn't really going anywhere, for
example. (But as a counterexample, I hope that nobody sends something
like this to #git-devel, which seems to have a lower population than
this mailing list.)

> 
> I could see a patch added on top of this for git-for-windows/git that
> changes the instructions to create issues on GitHub.

Indeed; I imagine we'd probably like to patch it to ask for bugs in our
bug tracker.

> 
> >  - Which config options should we filter to avoid accidentally receiving
> >    credentials?
> 
> The remote URLs are pretty sensitive. Not only do users sometimes put passwords
> or PATs into their URLs, the literal name of the repo could be a secret.

Now here's where I start to wonder. We often debug internally by asking
for the remote URL and replicating the issue there, which is why I
mentioned it explicitly in the commit message. But I hadn't considered
folks including the password in the URL.

Well, I suppose anybody can keep a local patch to change the config
filter pattern. I'll try to make it easy to spot and modify.

[snip]

> At first I was alarmed by "What? another shell script?" but this command should
> prioritize flexibility and extensibility over speed. Running multiple processes
> shouldn't be too taxing for what we are trying to do here.

If shell scripts are entirely deprecated I can convert it, but doing it
in C seemed like overkill when I really just wanted "what are all the
commands we would ask the user to run and tell us the output?". I
figured also that it would be a little more immune to bitrot to output
the contents of porcelain commands here.

> > +	echo "[Git Config]"
> > +	# TODO: Pass this through grep -v to avoid users sending us their credentials.
> > +	git config --show-origin --list
> > +	echo
> 
> Config options to consider stripping out:
> 
> 	*url*
> 	*pass* (anything "password" but also "sendmail.smtppass")

Done, thanks.

> 
> > +	echo "[Configured Hooks]"
> > +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
> > +	echo
> 
> Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
> is a binary file?

Yeah, I'm sure it will. I'll add a check to
print_filenames_and_content() so it can tell us if there is a hook
installed there, even if we can't see the content.

> 
> > +
> > +	echo "[Git Logs]"
> > +	find "$GIT_DIR/logs" -type f | print_filenames_and_content
> > +	echo
> 
> As mentioned before, I've sometimes found it helpful to know the data shape for the object
> store. Having a few extra steps such as the following could be nice:
> 
> 	echo "[Loose Objects]"
> 	for objdir in $(find "$GIT_DIR/objects/??" -type d)
> 	do
> 		echo "$objdir: $(ls $objdir | wc -l)"
`echo "$objdir: $(ls $objdir | wc -l) objects`
I'll add context so we don't need to have the bugreport script memorized
in order to read a bugreport :)
> 	done
> 	echo
> 
> 	echo "[Pack Data]"
> 	ls -l "$GIT_DIR/objects/pack"
> 	echo
> 
> 	echo "[Object Info Data]"
> 	ls -lR "$GIT_DIR/objects/info"
> 	echo
> 
> 	echo "[Alternates File]"
> 	echo "========"
> 	cat "$GIT_DIR/objects/info/alternates"
> 	echo
> 
> That last one will collect information on the commit-graph file, even if it is an
> incremental file.

Thanks Stolee, these are awesome and exactly the kind of feedback I was
hoping for.

> 
> I think this is a great start, and I'll take some time later to try it out.
> 
> Thanks,
> -Stolee

Awesome. I'm excited to hear how it goes.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 18:10 ` Junio C Hamano
@ 2019-08-15 21:52   ` Emily Shaffer
  2019-08-15 22:29     ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-08-15 21:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 15, 2019 at 11:10:12AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
> > new file mode 100644
> > index 0000000000..c5f45bbee8
> > --- /dev/null
> > +++ b/Documentation/git-bugreport.txt
> > @@ -0,0 +1,48 @@
> > +git-bugreport(1)
> > +================
> > +
> > +NAME
> > +----
> > +git-bugreport - Collect information for user to file a bug report
> > +
> > +SYNOPSIS
> > +--------
> > +[verse]
> > +'git bugreport' [-o | --output <path>]
> > +
> > +DESCRIPTION
> > +-----------
> > +Captures information about the user's machine, Git client, and repository state,
> > +as well as a form requesting information about the behavior the user observed,
> > +into a single text file which the user can then share, for example to the Git
> > +mailing list, in order to report an observed bug.
> > +
> > +The following information is requested from the user:
> > +
> > + - Reproduction steps
> > + - Expected behavior
> > + - Actual behavior
> 
> + - How the above two are different
> 
> It is often helpful to have users explain how the expected and
> actual are different in their own words.

Good point; added.

> 
> > +NOTE
> > +----
> > +Bug reports can be sent to git@vger.kernel.org.
> 
> I am not sure if this belongs here.

Sure, I wasn't certain either. Would you rather I remove the "what to do
with this bugreport" NOTE section entirely? I worry that folks will
think this generated text file is being magically reported behind the
scenes, or that they won't know what to do with the file now that it's
created.

I suppose I did mention in the DESCRIPTION section that the user can
share the generated text file to report a bug. Maybe I should remove
"for example to the Git ML" from there as well.

> 
> > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > new file mode 100755
> > index 0000000000..2200703a51
> > --- /dev/null
> > +++ b/git-bugreport.sh
> > @@ -0,0 +1,86 @@
> > +#!/bin/sh
> > +
> > +print_filenames_and_content() {
> > +while read -r line; do
> > +	echo "$line";
> > +	echo "========";
> > +	cat "$line";
> > +	echo;
> > +done
> > +}
> 
> Style.
> 
>  - have SP on both sides of ()
>  - one more HT indent for the function body
>  - "do" on its own line
>  - no unnecessary semicolons when LF would do
> 
> You probably are better off asking xargs to do this instead of
> relying on "read -r".
> 
> 	find ... | xargs -n 1 sh -c 'echo "$1" && cat "$1"' -
> 
> or something like that, perhaps.

Hm. In responding to Stolee's comments, I ended up replacing "cat $line"
with another function, and it seems xargs isn't happy directing to
functions which haven't been exported, because it spins a new process
for each argument it's passed. I had originally intended to do this with
xargs but then didn't want to pack many lines into one xargs call.

The way the code is in PS1 I think it's a reasonable replacement, but
the code in PS2 is less so - there's a check for whether a file given is
binary or whether it exists at all (the former for hooks, which I think
you were worried about, and the latter for the alternates file, which I
found I do not have.)

Why is "read -r" unreliable? Is it better to export a function and use
xargs, compared to using "read -r" in a loop like this? I was worried
that exporting a function is similar to namespace pollution in C++, but
maybe I'm mistaken.

> 
> > +
> > +generate_report_text() {
> > +
> > +	# Generate a form for the user to fill out with their issue.
> > +	gettextln "Thank you for filling out a Git bug report!"
> > +	gettextln "Please answer the following questions to help us understand your issue."
> > +	echo
> > +	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
> > +	echo
> > +	gettextln "What did you expect to happen? (Expected behavior)"
> > +	echo
> > +	gettextln "What happened instead? (Actual behavior)"
> > +	echo
> > +	gettextln "Anything else you want to add:"
> > +	echo
> > +	gettextln "Please review the rest of the bug report below."
> > +	gettextln "You can delete any lines you don't wish to send."
> > +	echo
> 
> Would we on the receiving end be able to tell these section headers
> in translated to 47 different languages?  I am sure that i18n is
> used here to encourage non-C-locale users to file bugs in their own
> languages, but are we prepared to react to them?

I hope that we could accept reports in many languages with online
translation as a guide, but practically speaking,
machine-learning-powered online translation may not be as good for
technical topics as it is for conversational.

At the same time, it seems unnecessarily restrictive to ask for bugs to
come in English. My heart thinks it's better to encourage other
languages too.

> 
> > +	echo "[System Information]"
> > +	git version --build-options
> > +	uname -a
> > +	curl-config --version
> > +	ldd --version
> 
> "curl-config: command not found" may be clear enough, but would
> there be a case where errors from two or more consecutive commands
> in the above sequence would make the output confusing to the person
> sitting on the receiving end?  Would it help, as a convention, to
> always ahve "echo [what am I reporting]" before each of these
> commands?

Yeah, I think that's a good point.

In responding to Stolee's review, I mentioned adding additional context
to some command's output so that "those reviewing the report don't need
to be intimately familiar with what the bugreport script runs". But it
might actually be most useful to echo the command that's being run
everywhere.

I imagine it's fine to include the [] header as well, to make it easier
for the user to understand what was generated.

> 
> > +# Create bugreport file
> > +BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"
> 
> How portable is -Iminutes?

Ah, it seems while it works in MingW and bash, it doesn't work in zsh.
I'll figure something else out. Thanks.

> 
> > diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> > new file mode 100755
> > index 0000000000..6eb2ee4f66
> > --- /dev/null
> > +++ b/t/t0091-bugreport.sh
> > @@ -0,0 +1,41 @@
> > +#!/bin/bash
> 
> Make sure /bin/sh suffices to run the test script.

Will do. Thanks.

> 
> > +test_description='git bugreport'
> > +
> > +. ./test-lib.sh
> > +
> > +# Headers "[System Info]" will be followed by a non-empty line if we put some
> > +# information there; we can make sure all our headers were followed by some
> > +# information to check if the command was successful.
> > +HEADER_PATTERN="^\[.*\]$"
> > +check_all_headers_populated() {
> > +	while read -r line; do
> > +		if [$(grep $HEADER_PATTERN $line)]; then
> 
> Documentation/CodingGuidelines
> 
> I am not sure if the traits this test script checks about the
> contents of the output is all that interesting.  Whenever we add new
> sections to the main command because we want other kinds of
> information collected, we'd update this test script because
> otherwise the test would fail, but would it result in quality
> bugreport tool, or is it just additional busywork?

I'm a little confused about this - I checked for a general [...] header
standing alone on a line specifically so that we wouldn't need to update
the test script when adding a new section. What do you see that will
grow stale?

> 
> If we decide later that a header and its body needs to be separated
> with a blank line for better readablity, the check done here would
> also need to be updated, but again, that does not feel like anything
> more than just busywork to me.

Sure, I agree with that.

So, what's your suggestion? Not to check the output at all? (This may
actually be fine; it occurred to me while reading your review that if a
user is filing a bug report about something, one of the diagnostic
commands in bugreport might be what's broken for them. So perhaps it
should be tolerant to missing information...)

> 
> The tests for "-o" and unknown options do make sense, though.
> 
> Thanks.
Thanks for reviewing.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 20:07   ` Johannes Schindelin
@ 2019-08-15 22:24     ` Emily Shaffer
  2019-08-16 20:19       ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-08-15 22:24 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Derrick Stolee, git

On Thu, Aug 15, 2019 at 10:07:57PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 15 Aug 2019, Derrick Stolee wrote:
> 
> > On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> > > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > > new file mode 100755
> > > index 0000000000..2200703a51
> > > --- /dev/null
> > > +++ b/git-bugreport.sh
> >
> > At first I was alarmed by "What? another shell script?" but this
> > command should prioritize flexibility and extensibility over speed.
> > Running multiple processes shouldn't be too taxing for what we are
> > trying to do here.
> 
> Git for Windows sometimes receives bug reports about Bash not being able
> to start (usually it is a DLL base address problem, related to the way
> Cygwin and MSYS2 emulate `fork()`).

Hmm. In a case like this, then, how is someone using GfW at all?
Embarrassingly, I don't actually have a way to try it out for myself.
It seems there's no GUI, and users are using it through the command line
in mingw, so when you say "bash doesn't start" do you mean "users can't
use the mingw prompt at all"?

> 
> In such a case, `git bugreport` would only be able to offer a reason for
> yet another bug report instead of adding useful metadata.
> 
> Something to keep in mind when deciding how to implement this command.
> 
> Ciao,
> Dscho

Yeah, that's an interesting point, thanks for bringing it up. So, in the
case when bash won't start at all, what does that failure look like? How
much of Git can users still use? For example, would they be able to get
far enough to run a binary git-bugreport?

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 21:52   ` Emily Shaffer
@ 2019-08-15 22:29     ` Junio C Hamano
  2019-08-15 22:54       ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-08-15 22:29 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

>> > +NOTE
>> > +----
>> > +Bug reports can be sent to git@vger.kernel.org.
>> 
>> I am not sure if this belongs here.
>
> Sure, I wasn't certain either. Would you rather I remove the "what to do
> with this bugreport" NOTE section entirely?

Not really.  You are invoking an editor to let the user edit the
pre-populated report, and I would imagine that a comment in that
file would be the best place to give instructions, not a manpage
for "git bugreport" command.

> So, what's your suggestion? Not to check the output at all? (This may
> actually be fine; it occurred to me while reading your review that if a
> user is filing a bug report about something, one of the diagnostic
> commands in bugreport might be what's broken for them. So perhaps it
> should be tolerant to missing information...)

Right.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 14:36   ` Junio C Hamano
@ 2019-08-15 22:52     ` Emily Shaffer
  2019-08-15 23:40       ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-08-15 22:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git

On Thu, Aug 15, 2019 at 07:36:57AM -0700, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
> > Config options to consider stripping out:
> >
> > 	*url*
> > 	*pass* (anything "password" but also "sendmail.smtppass")
> 
> Blacklisting?  I wonder if users feel safer if these are limited to
> known-benign ones.

I think a whitelist of config options to print would grow stale
immediately, and the options we're missing would be very likely to be
configs to turn on new experimental features - which is probably what we
most want the bugreport for.

> 
> >> +	echo "[Configured Hooks]"
> >> +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
> >> +	echo
> >
> > Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
> > is a binary file?
> 
> This makes me feel very nervous.  $GIT_DIR/hooks/ are private and
> people can hardcode credentials in them; $GIT_DIR/hooks/pre-foo may
> be written toread from $GIT_DIR/hooks/mypassword with the knowledge
> that there won't be any "mypassword" hook.

Hmm. I think the list of valid hooks isn't one that changes often, but
it's also not enumerated in some machine-parseable way - it exists in
Documentation/githooks.txt but that's all. I'd still be a little worried
about bitrot... I think it's probably better to list the filenames in
$GIT_DIR/hooks but not print their contents. I'll modify it.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 22:29     ` Junio C Hamano
@ 2019-08-15 22:54       ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-15 22:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 15, 2019 at 03:29:24PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> >> > +NOTE
> >> > +----
> >> > +Bug reports can be sent to git@vger.kernel.org.
> >> 
> >> I am not sure if this belongs here.
> >
> > Sure, I wasn't certain either. Would you rather I remove the "what to do
> > with this bugreport" NOTE section entirely?
> 
> Not really.  You are invoking an editor to let the user edit the
> pre-populated report, and I would imagine that a comment in that
> file would be the best place to give instructions, not a manpage
> for "git bugreport" command.

Oh, I see! In that case, do you still want the Git mailing list address
shown in the report text?

> 
> > So, what's your suggestion? Not to check the output at all? (This may
> > actually be fine; it occurred to me while reading your review that if a
> > user is filing a bug report about something, one of the diagnostic
> > commands in bugreport might be what's broken for them. So perhaps it
> > should be tolerant to missing information...)
> 
> Right.

Ok, will do.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 22:52     ` Emily Shaffer
@ 2019-08-15 23:40       ` Junio C Hamano
  2019-08-16  1:25         ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-08-15 23:40 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Derrick Stolee, git

Emily Shaffer <emilyshaffer@google.com> writes:

> I think a whitelist of config options to print would grow stale
> immediately, and the options we're missing would be very likely to be
> configs to turn on new experimental features - which is probably what we
> most want the bugreport for.

The implementation of your "git bugreport" command comes from the
same version of Git as such an experimental feature you want
feedback to is shipped with, so I am not sure where your concern
about staleness comes from.

If you really care about getting quality reports, you need to earn
users' trust (one way of doing so would be to maintain a good "list
of benign configuration knobs whose values help diagnosing issues"),
and you need to make sure we obtain relevant pieces of information.
Both of these things are something you need to actively work on.

We have trained ourselves to the point that we consider it a bug if
you add a new command git-bugreport without adding a corresponding
pattern in the .gitignore file.  And thanks to that discipline, we
have been fairly good at keeping .gitignore up to date.  I do not
see a reason why we cannot do something similar to the registry of
configuration variables that we care about, which may be shipped as
part of "git bugreport" command.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 23:40       ` Junio C Hamano
@ 2019-08-16  1:25         ` Emily Shaffer
  2019-08-16 16:41           ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-08-16  1:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git

On Thu, Aug 15, 2019 at 04:40:50PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > I think a whitelist of config options to print would grow stale
> > immediately, and the options we're missing would be very likely to be
> > configs to turn on new experimental features - which is probably what we
> > most want the bugreport for.
> 
> The implementation of your "git bugreport" command comes from the
> same version of Git as such an experimental feature you want
> feedback to is shipped with, so I am not sure where your concern
> about staleness comes from.
> 
> If you really care about getting quality reports, you need to earn
> users' trust (one way of doing so would be to maintain a good "list
> of benign configuration knobs whose values help diagnosing issues"),
> and you need to make sure we obtain relevant pieces of information.
> Both of these things are something you need to actively work on.
> 
> We have trained ourselves to the point that we consider it a bug if
> you add a new command git-bugreport without adding a corresponding
> pattern in the .gitignore file.  And thanks to that discipline, we
> have been fairly good at keeping .gitignore up to date.  I do not
> see a reason why we cannot do something similar to the registry of
> configuration variables that we care about, which may be shipped as
> part of "git bugreport" command.

I think comparing this habit to the .gitignore isn't quite fair -
.gitignore tells me I forgot to add my new command binary to it, when I
run `git status` to see what I need to add to my commit with new
command.

But, I agree that it's not going to be possible to create a completely
leakproof blacklisting heuristic here. So I'll use a whitelist for the
next patchset.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-16  1:25         ` Emily Shaffer
@ 2019-08-16 16:41           ` Junio C Hamano
  2019-08-16 19:08             ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-08-16 16:41 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Derrick Stolee, git

Emily Shaffer <emilyshaffer@google.com> writes:

> I think comparing this habit to the .gitignore isn't quite fair -
> .gitignore tells me I forgot to add my new command binary to it, when I
> run `git status` to see what I need to add to my commit with new
> command.

That is why I said that we need to actively work on, if we care
about getting quality reports.

I do not think it is unreasonable to expect the build procedure for
"git bugreport" to involve scanning in Documentation/config/ to pick
up variable names, annotated in such a way that is invisible to
AsciiDoc to allow us tell which ones are sensitive and which ones
are not.  A test in t/ could even check if a documented
configuration variable has such an annotation.  A commit that adds
configuration variables without documentiong them does exist, but
variables without documentation are (1) bugs, and (2) are not worth
serious engineering effort on until they get documented.

I am not saying that you must implement the whitelist exactly like
the above.  I am not even saying that this must be whitelist and not
blacklist---you'd have the same issue maintaining the list anyway.

The above is merely to illustrate the reason why I think that the
kind of "active work" to make sure that the list will not go stale
would be feasible.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-16 16:41           ` Junio C Hamano
@ 2019-08-16 19:08             ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-16 19:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git

On Fri, Aug 16, 2019 at 09:41:41AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > I think comparing this habit to the .gitignore isn't quite fair -
> > .gitignore tells me I forgot to add my new command binary to it, when I
> > run `git status` to see what I need to add to my commit with new
> > command.
> 
> That is why I said that we need to actively work on, if we care
> about getting quality reports.
> 
> I do not think it is unreasonable to expect the build procedure for
> "git bugreport" to involve scanning in Documentation/config/ to pick
> up variable names, annotated in such a way that is invisible to
> AsciiDoc to allow us tell which ones are sensitive and which ones
> are not.  A test in t/ could even check if a documented
> configuration variable has such an annotation.  A commit that adds
> configuration variables without documentiong them does exist, but
> variables without documentation are (1) bugs, and (2) are not worth
> serious engineering effort on until they get documented.

Interesting. I think I have an idea for a way to do this, but it ends up
fairly large; I'll send proof of concept as a follow-on patch to this
one. Thanks for the suggestions.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] bugreport: add tool to generate debugging info
  2019-08-15 22:24     ` Emily Shaffer
@ 2019-08-16 20:19       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-08-16 20:19 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Derrick Stolee, git

Hi Emily,

On Thu, 15 Aug 2019, Emily Shaffer wrote:

> On Thu, Aug 15, 2019 at 10:07:57PM +0200, Johannes Schindelin wrote:
> >
> > On Thu, 15 Aug 2019, Derrick Stolee wrote:
> >
> > > On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> > > > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > > > new file mode 100755
> > > > index 0000000000..2200703a51
> > > > --- /dev/null
> > > > +++ b/git-bugreport.sh
> > >
> > > At first I was alarmed by "What? another shell script?" but this
> > > command should prioritize flexibility and extensibility over speed.
> > > Running multiple processes shouldn't be too taxing for what we are
> > > trying to do here.
> >
> > Git for Windows sometimes receives bug reports about Bash not being able
> > to start (usually it is a DLL base address problem, related to the way
> > Cygwin and MSYS2 emulate `fork()`).
>
> Hmm. In a case like this, then, how is someone using GfW at all?

On Windows, there are native command-line interpreters available that
are _not_ Bash: PowerShell and CMD.

These days, you can get away with using Git _without_ a working Unix
shell most of the time. There are only preciously few commands left that
are still written as scripts, and most of these seem to be used less
often than one might think:

	- submodule
	- bisect
	- filter-branch
	- instaweb
	- mergetool
	- some uncommon merge strategies
	- rebase -p (already deprecated)
	- several CVS/Subversion/Arch/Quilt integrations
	- send-email (only relevant for mailing list centric projects,
	  again, not very common any longer)
	- request-pull (I would not be surprised if this is specific for
	  the Linux project, as if that project was even close to the
	  main user of Git)

So as long as you don't use submodules, and as long as you are unaware
of the `bisect` command, which would comprise the majority of Git users
in my experience, you can totally use Git for Windows without using Bash
or Perl, i.e. without using the Cygwin/MSYS2 runtime that seems to be
one of the common causes for trouble in Git for Windows.

> Embarrassingly, I don't actually have a way to try it out for myself.
> It seems there's no GUI, and users are using it through the command line
> in mingw, so when you say "bash doesn't start" do you mean "users can't
> use the mingw prompt at all"?

There is no MinGW prompt.

MinGW stands for "Minimal GNU on Windows", but it really refers to a way
to compile native Win32 programs via GCC. Meaning that some of the POSIX
functionality Unix/Linux developers have come to expect is unavailable.
For example, `fork()`. Which means that there is no native port of Bash
to Windows, at least not that _I_ am aware of.

Git for Windows comes with a "Git Bash", which is essentially a Bash
built against the MSYS2 runtime (i.e. it is much more like a Cygwin
executable than like a Win32 executable).

The common way to run Git for Windows is actually via `git.exe`,
independent of what particular command-line interpreter you prefer.

> > In such a case, `git bugreport` would only be able to offer a reason for
> > yet another bug report instead of adding useful metadata.
> >
> > Something to keep in mind when deciding how to implement this command.
>
> Yeah, that's an interesting point, thanks for bringing it up. So, in the
> case when bash won't start at all, what does that failure look like? How
> much of Git can users still use? For example, would they be able to get
> far enough to run a binary git-bugreport?

See above.

If I were you, I would not write `git bugreport` as anything but a
built-in, written in C. Unless there are _really_ good reasons not to
[*1*].  And quite honestly, I don't see any such reasons.

Ciao,
Dscho

Footnote *1*: One good reason to use the MSYS2 Bash would be to be able
to use MSYS2's POSIX emulation layer, e.g. to talk to the SSH agent
(that is also an MSYS2 program, meaning that it communicates via
emulated Unix sockets, which is a facility not available to native Win32
programs).

Another valid reason to use the MSYS2 Bash is to be able to talk to the
(emulated) pseudo TTYs provided by the MSYS2 runtime, e.g. when running
inside a MinTTY window, which is the default for Git Bash in Git for
Windows.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/2] add git-bugreport tool
  2019-08-15  2:34 [PATCH] bugreport: add tool to generate debugging info Emily Shaffer
  2019-08-15 14:15 ` Derrick Stolee
  2019-08-15 18:10 ` Junio C Hamano
@ 2019-08-17  0:39 ` Emily Shaffer
  2019-08-17  0:39   ` [PATCH v2 1/2] bugreport: add tool to generate debugging info Emily Shaffer
                     ` (2 more replies)
  2 siblings, 3 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-17  0:39 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Derrick Stolee, Johannes Schindelin, Junio C Hamano

Having read Dscho's response after already writing this reroll, I think
it is a compelling case to rewrite as a C builtin, although I'm worried
a little about the best way to continue to call porcelain commands to
protect against bitrot.

Regardless, I think the proof of concept in patch 2/2 still deserves
review; likely a reimplementation in C would see
git-bugreport-config-whitelist generated in a similar way, but as a
header instead of a textfile. So I'm still very interested in comments
on the direction and the Makefile changes there.

The patch 1/2 changes are primarily based on Stolee and Junio's
suggestions; thanks both. There's some rewording of manpage and commit
message, as well as fixes when I realized this command did not work as
expected from a worktree created with `git worktree add`. I also moved
the reflog contents to the very bottom of the output as they're likely
harmless, but very verbose; I didn't want the user to stop reading and
never examine the shorter and scarier generated output, like the config.

Thanks for thoughts.
 - Emily

Emily Shaffer (2):
  bugreport: add tool to generate debugging info
  bugreport: generate config whitelist based on docs

 .gitignore                             |   2 +
 Documentation/config/sendemail.txt     |  68 ++++++------
 Documentation/git-bugreport.txt        |  51 +++++++++
 Makefile                               |  10 +-
 bugreport-generate-config-whitelist.sh |   4 +
 command-list.txt                       |   1 +
 git-bugreport.sh                       | 139 +++++++++++++++++++++++++
 git-sh-setup.sh                        |   2 +
 t/t0091-bugreport.sh                   |  25 +++++
 9 files changed, 267 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/git-bugreport.txt
 create mode 100755 bugreport-generate-config-whitelist.sh
 create mode 100755 git-bugreport.sh
 create mode 100755 t/t0091-bugreport.sh

-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/2] bugreport: add tool to generate debugging info
  2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
@ 2019-08-17  0:39   ` Emily Shaffer
  2019-08-17  0:39   ` [PATCH v2 2/2] bugreport: generate config whitelist based on docs Emily Shaffer
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
  2 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-17  0:39 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Make it easier for users who encounter a bug to send a report by
collecting some state information, intended to be viewed by humans
familiar with Git.

Teach Git how to prompt the user for a good bug report: reproduction
steps, expected behavior, and actual behavior. Also, teach Git to write
down its own version, the version of some of its dependencies, the
operating system information, select entries from the gitconfig, the
configured hooks, and the contents of $GIT_DIR/logs. Finally, make sure
Git asks the user to review the contents of the report after it's
generated.

If users can send us a well-written bug report which contains diagnostic
information we would otherwise need to ask the user for, we can reduce
the number of question-and-answer round trips between the reporter and
the Git contributor.

Users may also wish to send a report like this to their local "Git
expert" if they have put their repository into a state they are confused
by.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                      |   1 +
 Documentation/git-bugreport.txt |  51 ++++++++++++
 Makefile                        |   1 +
 command-list.txt                |   1 +
 git-bugreport-config-whitelist  |  22 +++++
 git-bugreport.sh                | 139 ++++++++++++++++++++++++++++++++
 git-sh-setup.sh                 |   2 +
 t/t0091-bugreport.sh            |  25 ++++++
 8 files changed, 242 insertions(+)
 create mode 100644 Documentation/git-bugreport.txt
 create mode 100644 git-bugreport-config-whitelist
 create mode 100755 git-bugreport.sh
 create mode 100755 t/t0091-bugreport.sh

diff --git a/.gitignore b/.gitignore
index 521d8f4fb4..b4f5433084 100644
--- a/.gitignore
+++ b/.gitignore
@@ -25,6 +25,7 @@
 /git-bisect--helper
 /git-blame
 /git-branch
+/git-bugreport
 /git-bundle
 /git-cat-file
 /git-check-attr
diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
new file mode 100644
index 0000000000..2fc0a63192
--- /dev/null
+++ b/Documentation/git-bugreport.txt
@@ -0,0 +1,51 @@
+git-bugreport(1)
+================
+
+NAME
+----
+git-bugreport - Collect information for user to file a bug report
+
+SYNOPSIS
+--------
+[verse]
+'git bugreport' [-o | --output <path>]
+
+DESCRIPTION
+-----------
+Captures information about the user's machine, Git client, and repository state,
+as well as a form requesting information about the behavior the user observed,
+into a single text file which the user can then share, for example to the Git
+mailing list, in order to report an observed bug.
+
+The following information is requested from the user:
+
+ - Reproduction steps
+ - Expected behavior
+ - Actual behavior
+ - Difference between expected and actual behavior
+
+The following information is captured automatically:
+
+ - Git version (`git version --build-options`)
+ - Machine information (`uname -a`)
+ - Versions of various dependencies (curl, ldd)
+ - Relevant user environment variables ($SHELL)
+ - Some Git config variables, filtered by a whitelist
+   (`git config --show-origin --get`)
+ - A list of all configured git-hooks in `.git/hooks/` (but not their contents)
+ - A list of all loose objects in `.git/objects/`
+ - A list of all packs in `.git/objects/pack`
+ - A list of all object info in `.git/objects/info`
+ - The contents of the alternates file (`.git/objects/info/alternates`)
+ - The contents of `.git/logs`
+
+OPTIONS
+-------
+-o [<path>]::
+--output [<path>]::
+	Place the resulting bug report file in <path> instead of the root of the
+	Git repository.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index f9255344ae..69801a1c45 100644
--- a/Makefile
+++ b/Makefile
@@ -606,6 +606,7 @@ TEST_PROGRAMS_NEED_X =
 unexport CDPATH
 
 SCRIPT_SH += git-bisect.sh
+SCRIPT_SH += git-bugreport.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
diff --git a/command-list.txt b/command-list.txt
index a9ac72bef4..be5a605047 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -54,6 +54,7 @@ git-archive                             mainporcelain
 git-bisect                              mainporcelain           info
 git-blame                               ancillaryinterrogators          complete
 git-branch                              mainporcelain           history
+git-bugreport                           ancillaryinterrogators
 git-bundle                              mainporcelain
 git-cat-file                            plumbinginterrogators
 git-check-attr                          purehelpers
diff --git a/git-bugreport-config-whitelist b/git-bugreport-config-whitelist
new file mode 100644
index 0000000000..e4f07f7175
--- /dev/null
+++ b/git-bugreport-config-whitelist
@@ -0,0 +1,22 @@
+http.version
+protocol.version
+protocol.persistent-https.allow
+protocol.rpc.allow
+protocol.sso.allow
+submodule.repolike
+trace2.eventtarget
+trace2.configparams
+color.ui
+core.pager
+sendemail.smtpencryption
+sendemail.smtpserver
+sendemail.smtpserverport
+sendemail.smtpsslcertpath
+credential.helper
+merge.tool
+grep.linenumber
+rerere.enabled
+core.repositoryformatversion
+core.filemode
+core.bare
+core.logallrefupdates
diff --git a/git-bugreport.sh b/git-bugreport.sh
new file mode 100755
index 0000000000..0c182c62e9
--- /dev/null
+++ b/git-bugreport.sh
@@ -0,0 +1,139 @@
+#!/bin/sh
+
+print_file_if_exists () {
+	if [ -f "$1" ]
+	then
+		if $(file --mime "$1" | grep -q "charset=binary")
+		then
+			echo "<binary file>"
+		else
+			cat "$1"
+		fi
+	else
+		echo "<file does not exist>"
+	fi
+}
+
+
+print_filenames_and_content () {
+	while read -r line
+	do
+		echo "$line"
+		echo "========"
+		print_file_if_exists "$line"
+		echo
+	done
+}
+
+command_and_output () {
+	echo "$@:"
+	"$@"
+	echo
+}
+
+generate_report_text() {
+
+	# Generate a form for the user to fill out with their issue.
+	gettextln "Thank you for filling out a Git bug report!"
+	gettextln "Please answer the following questions to help us understand your issue."
+	echo
+	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
+	echo
+	gettextln "What did you expect to happen? (Expected behavior)"
+	echo
+	gettextln "What happened instead? (Actual behavior)"
+	echo
+	gettextln "What's different between what you expected and what actually happened?"
+	echo
+	gettextln "Anything else you want to add:"
+	echo
+	gettextln "Please review the rest of the bug report below."
+	gettextln "You can delete any lines you don't wish to send."
+	gettextln \
+"When you're done, you should include the contents of this file in your bug \n\
+report, for example by pasting it into an email to git@vger.kernel.org."
+	echo
+
+	# Versions of various necessary software should go here.
+	echo "[Version Information]"
+	command_and_output git version --build-options
+	command_and_output uname -a
+	command_and_output curl-config --version
+	command_and_output ldd --version
+	echo
+
+	# Values of environment variables we find useful should go here.
+	echo "[Environment Information]"
+	command_and_output echo $SHELL
+
+	echo "[Git Config]"
+	xargs -n 1 -I% sh -c 'echo %: $(git config --show-origin --get %)' \
+		<"$(git --exec-path)/git-bugreport-config-whitelist"
+	echo
+
+	echo "[Configured Hooks]"
+	find "$GIT_HOOKS_DIRECTORY" -type f | grep -v "\.sample$"
+	echo
+
+	echo "[Loose Objects]"
+	for objdir in $(find "$GIT_OBJECT_DIRECTORY"/?? -type d);
+	do
+		echo "$objdir: $(ls $objdir | wc -l) objects"
+	done
+	echo
+	
+	echo "[Pack Data]"
+	command_and_output ls -l "$GIT_OBJECT_DIRECTORY/pack"
+	echo
+	
+	echo "[Object Info Data]"
+	command_and_output ls -lR "$GIT_OBJECT_DIRECTORY/info"
+	echo
+	
+	echo "[Alternates File]"
+        echo "$GIT_OBJECT_DIRECTORY/info/alternates" | print_filenames_and_content
+	echo
+
+	# This command is the reflog; it's fairly chatty and hard to scan
+	# quickly, so leave it at the bottom so that users don't miss what
+	# happens before it.
+	echo "[Git Logs]"
+	find "$GIT_LOGS_DIRECTORY" -type f | print_filenames_and_content
+	echo
+}
+
+USAGE="[-o | --output <path>]"
+
+SUBDIRECTORY_OK=t
+OPTIONS_SPEC=
+. git-sh-setup
+. git-sh-i18n
+
+basedir="$PWD"
+while :
+do
+	case "$1" in
+	-o|--output)
+		shift
+		basedir="$1"
+		shift
+		continue
+		;;
+	"")
+		break
+		;;
+	*)
+		usage
+		;;
+	esac
+done
+
+
+# Create bugreport file
+BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date +%Y%m%d-%H%M)"
+
+generate_report_text >$BUGREPORT_FILE
+
+git_editor $BUGREPORT_FILE
+
+eval_gettextln "Your new bug report is in \$BUGREPORT_FILE."
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 10d9764185..1db8af0d76 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -379,6 +379,8 @@ git_dir_init () {
 		exit 1
 	}
 	: "${GIT_OBJECT_DIRECTORY="$(git rev-parse --git-path objects)"}"
+        : "${GIT_LOGS_DIRECTORY="$(git rev-parse --git-path logs)"}"
+        : "${GIT_HOOKS_DIRECTORY="$(git rev-parse --git-path hooks)"}"
 }
 
 if test -z "$NONGIT_OK"
diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
new file mode 100755
index 0000000000..e4e2d1a927
--- /dev/null
+++ b/t/t0091-bugreport.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description='git bugreport'
+
+. ./test-lib.sh
+
+test_expect_success 'creates a report without crashing' '
+	git bugreport &&
+	rm git-bugreport-*
+'
+
+test_expect_success '--output puts the report in the provided dir' '
+	mkdir foo/ &&
+	git bugreport -o foo/ &&
+	test -f foo/git-bugreport-* &&
+	rm -fr foo/
+'
+
+test_expect_success 'incorrect arguments abort with usage' '
+	test_must_fail git bugreport --false 2>output &&
+	grep usage output &&
+	test ! -f git-bugreport-*
+'
+
+test_done
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 2/2] bugreport: generate config whitelist based on docs
  2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
  2019-08-17  0:39   ` [PATCH v2 1/2] bugreport: add tool to generate debugging info Emily Shaffer
@ 2019-08-17  0:39   ` Emily Shaffer
  2019-08-17 20:38     ` Martin Ågren
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
  2 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-08-17  0:39 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Add a new step to the build to generate a whitelist of git-config
variables which are appropriate to include in the output of
git-bugreport. New variables can be added to the whitelist by annotating
their documentation in Documentation/config with the line
"// bugreport-include".

Some configs are private in nature, and can contain remote URLs,
passwords, or other sensitive information. In the event that a user
doesn't notice their information while reviewing a bugreport, that user
may leak their credentials to other individuals, mailing lists, or bug
tracking tools inadvertently. Heuristic blacklisting of configuration
keys is imperfect and prone to false negatives; given the nature of the
information which can be leaked, a whitelist is more reliable.

In order to prevent staleness of the whitelist, add a mechanism to
generate the whitelist from annotations in the config documentation,
where contributors are already used to documenting their new config
keys.

Additionally, add annotations to the sendemail config documentation in
order to demonstrate a proof of concept.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                             |  1 +
 Documentation/config/sendemail.txt     | 68 +++++++++++++-------------
 Makefile                               |  9 +++-
 bugreport-generate-config-whitelist.sh |  4 ++
 git-bugreport-config-whitelist         | 22 ---------
 5 files changed, 47 insertions(+), 57 deletions(-)
 create mode 100755 bugreport-generate-config-whitelist.sh
 delete mode 100644 git-bugreport-config-whitelist

diff --git a/.gitignore b/.gitignore
index b4f5433084..65cc74c748 100644
--- a/.gitignore
+++ b/.gitignore
@@ -26,6 +26,7 @@
 /git-blame
 /git-branch
 /git-bugreport
+/git-bugreport-config-whitelist
 /git-bundle
 /git-cat-file
 /git-check-attr
diff --git a/Documentation/config/sendemail.txt b/Documentation/config/sendemail.txt
index 0006faf800..69f3e4f219 100644
--- a/Documentation/config/sendemail.txt
+++ b/Documentation/config/sendemail.txt
@@ -1,63 +1,63 @@
-sendemail.identity::
+sendemail.identity:: // bugreport-exclude
 	A configuration identity. When given, causes values in the
 	'sendemail.<identity>' subsection to take precedence over
 	values in the 'sendemail' section. The default identity is
 	the value of `sendemail.identity`.
 
-sendemail.smtpEncryption::
+sendemail.smtpEncryption:: // bugreport-include
 	See linkgit:git-send-email[1] for description.  Note that this
 	setting is not subject to the 'identity' mechanism.
 
-sendemail.smtpssl (deprecated)::
+sendemail.smtpssl (deprecated):: // bugreport-exclude
 	Deprecated alias for 'sendemail.smtpEncryption = ssl'.
 
-sendemail.smtpsslcertpath::
+sendemail.smtpsslcertpath:: // bugreport-exclude
 	Path to ca-certificates (either a directory or a single file).
 	Set it to an empty string to disable certificate verification.
 
-sendemail.<identity>.*::
+sendemail.<identity>.*:: // bugreport-exclude
 	Identity-specific versions of the 'sendemail.*' parameters
 	found below, taking precedence over those when this
 	identity is selected, through either the command-line or
 	`sendemail.identity`.
 
-sendemail.aliasesFile::
-sendemail.aliasFileType::
-sendemail.annotate::
-sendemail.bcc::
-sendemail.cc::
-sendemail.ccCmd::
-sendemail.chainReplyTo::
-sendemail.confirm::
-sendemail.envelopeSender::
-sendemail.from::
-sendemail.multiEdit::
-sendemail.signedoffbycc::
-sendemail.smtpPass::
-sendemail.suppresscc::
-sendemail.suppressFrom::
-sendemail.to::
-sendemail.tocmd::
-sendemail.smtpDomain::
-sendemail.smtpServer::
-sendemail.smtpServerPort::
-sendemail.smtpServerOption::
-sendemail.smtpUser::
-sendemail.thread::
-sendemail.transferEncoding::
-sendemail.validate::
-sendemail.xmailer::
+sendemail.aliasesFile:: // bugreport-exclude
+sendemail.aliasFileType:: // bugreport-exclude
+sendemail.annotate:: // bugreport-include
+sendemail.bcc:: // bugreport-include
+sendemail.cc:: // bugreport-include
+sendemail.ccCmd:: // bugreport-include
+sendemail.chainReplyTo:: // bugreport-include
+sendemail.confirm:: // bugreport-include
+sendemail.envelopeSender:: // bugreport-include
+sendemail.from:: // bugreport-include
+sendemail.multiEdit:: // bugreport-include
+sendemail.signedoffbycc:: // bugreport-include
+sendemail.smtpPass:: // bugreport-exclude
+sendemail.suppresscc:: // bugreport-include
+sendemail.suppressFrom:: // bugreport-include
+sendemail.to:: // bugreport-include
+sendemail.tocmd:: // bugreport-include
+sendemail.smtpDomain:: // bugreport-include
+sendemail.smtpServer:: // bugreport-include
+sendemail.smtpServerPort:: // bugreport-include
+sendemail.smtpServerOption:: // bugreport-include
+sendemail.smtpUser:: // bugreport-exclude
+sendemail.thread:: // bugreport-include
+sendemail.transferEncoding:: // bugreport-include
+sendemail.validate:: // bugreport-include
+sendemail.xmailer:: // bugreport-include
 	See linkgit:git-send-email[1] for description.
 
-sendemail.signedoffcc (deprecated)::
+sendemail.signedoffcc (deprecated):: // bugreport-exclude
 	Deprecated alias for `sendemail.signedoffbycc`.
 
-sendemail.smtpBatchSize::
+sendemail.smtpBatchSize:: // bugreport-include
 	Number of messages to be sent per connection, after that a relogin
 	will happen.  If the value is 0 or undefined, send all messages in
 	one connection.
 	See also the `--batch-size` option of linkgit:git-send-email[1].
 
-sendemail.smtpReloginDelay::
+sendemail.smtpReloginDelay:: // bugreport-include
 	Seconds wait before reconnecting to smtp server.
 	See also the `--relogin-delay` option of linkgit:git-send-email[1].
diff --git a/Makefile b/Makefile
index 69801a1c45..a62bbefb8f 100644
--- a/Makefile
+++ b/Makefile
@@ -639,6 +639,10 @@ SCRIPT_PYTHON += git-p4.py
 SCRIPT_SH_GEN = $(patsubst %.sh,%,$(SCRIPT_SH))
 SCRIPT_PERL_GEN = $(patsubst %.perl,%,$(SCRIPT_PERL))
 SCRIPT_PYTHON_GEN = $(patsubst %.py,%,$(SCRIPT_PYTHON))
+SCRIPT_DEPENDENCIES = git-bugreport-config-whitelist
+
+$(SCRIPT_DEPENDENCIES): Documentation/config/*.txt
+	sh bugreport-generate-config-whitelist.sh
 
 # Individual rules to allow e.g.
 # "make -C ../.. SCRIPT_PERL=contrib/foo/bar.perl build-perl-script"
@@ -656,17 +660,20 @@ install-perl-script: $(SCRIPT_PERL_GEN)
 install-python-script: $(SCRIPT_PYTHON_GEN)
 	$(INSTALL) $^ '$(DESTDIR_SQ)$(gitexec_instdir_SQ)'
 
-.PHONY: clean-perl-script clean-sh-script clean-python-script
+.PHONY: clean-perl-script clean-sh-script clean-python-script clean-script-dependencies
 clean-sh-script:
 	$(RM) $(SCRIPT_SH_GEN)
 clean-perl-script:
 	$(RM) $(SCRIPT_PERL_GEN)
 clean-python-script:
 	$(RM) $(SCRIPT_PYTHON_GEN)
+clean-script-dependencies:
+	$(RM) $(SCRIPT_DEPENDENCIES)
 
 SCRIPTS = $(SCRIPT_SH_GEN) \
 	  $(SCRIPT_PERL_GEN) \
 	  $(SCRIPT_PYTHON_GEN) \
+	  $(SCRIPT_DEPENDENCIES) \
 	  git-instaweb
 
 ETAGS_TARGET = TAGS
diff --git a/bugreport-generate-config-whitelist.sh b/bugreport-generate-config-whitelist.sh
new file mode 100755
index 0000000000..ca6b232024
--- /dev/null
+++ b/bugreport-generate-config-whitelist.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+grep -RhPo ".*(?=:: \/\/ bugreport-include)" Documentation/config \
+  >git-bugreport-config-whitelist
diff --git a/git-bugreport-config-whitelist b/git-bugreport-config-whitelist
deleted file mode 100644
index e4f07f7175..0000000000
--- a/git-bugreport-config-whitelist
+++ /dev/null
@@ -1,22 +0,0 @@
-http.version
-protocol.version
-protocol.persistent-https.allow
-protocol.rpc.allow
-protocol.sso.allow
-submodule.repolike
-trace2.eventtarget
-trace2.configparams
-color.ui
-core.pager
-sendemail.smtpencryption
-sendemail.smtpserver
-sendemail.smtpserverport
-sendemail.smtpsslcertpath
-credential.helper
-merge.tool
-grep.linenumber
-rerere.enabled
-core.repositoryformatversion
-core.filemode
-core.bare
-core.logallrefupdates
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] bugreport: generate config whitelist based on docs
  2019-08-17  0:39   ` [PATCH v2 2/2] bugreport: generate config whitelist based on docs Emily Shaffer
@ 2019-08-17 20:38     ` Martin Ågren
  2019-08-21 17:40       ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Martin Ågren @ 2019-08-17 20:38 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On Sat, 17 Aug 2019 at 02:42, Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Add a new step to the build to generate a whitelist of git-config
> variables which are appropriate to include in the output of
> git-bugreport. New variables can be added to the whitelist by annotating
> their documentation in Documentation/config with the line
> "// bugreport-include".

These "// bugreport-include" show up in the rendered manpages, both with
AsciiDoc and Asciidoctor. :-(

> --- a/Documentation/config/sendemail.txt
> +++ b/Documentation/config/sendemail.txt
> @@ -1,63 +1,63 @@
> -sendemail.identity::
> +sendemail.identity:: // bugreport-exclude
>         A configuration identity. When given, causes values in the

If I put each comment on a line of its own (after the config option, but
I suppose before would work the same way), Asciidoctor truly ignores
them and everything's fine. And AsciiDoc renders this one and others
like it ok.

> -sendemail.aliasesFile::
> -sendemail.aliasFileType::
> -sendemail.annotate::
> +sendemail.aliasesFile:: // bugreport-exclude
> +sendemail.aliasFileType:: // bugreport-exclude
> +sendemail.annotate:: // bugreport-include

However, AsciiDoc (version 8.6.10) seems to effectively replace those
comments with an empty line during processing, and it makes quite the
difference here. Instead of these appearing in a compact comma-separated
list, they are treated as individual items in the description list with
no supporting content.

FWIW, I like the idea of annotating things here to make it harder to
forget this whitelisting when adding a new config item.

Below is what I came up with as an alternative approach. Feel free to
steal, squash and/or ignore as you see fit.

BTW, I'm not sure the grep invocation is portable (-R ? -h ? -P ? -o ?).
We should probably also do the usual "create a candidate output file,
then move it into place" dance for robustness.

I do think we should test the generated whitelist in some minimal way,
e.g., to check that it does contain something which objectively belongs
in the whitelist and -- more importantly IMHO -- does *not* contain
something that shouldn't be there, such as sendemail.smtpPass.

Martin

-- >8 --
Subject: [PATCH] Use a bugreport:include/exclude macro instead

Implement a no-op "bugreport" macro for use as "bugreport:include[x]" to
annotate the config keys that should be included in the automatically
generated whitelist. Use "exclude" for the others.

With Asciidoctor, it's ok to say "bugreport:include[]", but AsciiDoc
seems to want something between the brackets. A bit unfortunate, but
not a huge problem -- we'll just provide an "x".

"doc-diff" reports that this macro doesn't render at all. That is,
these are both empty after this commit:

  cd Documentation
  ./doc-diff --asciidoctor :/"bugreport: add tool" HEAD
  ./doc-diff --asciidoc    :/"bugreport: add tool" HEAD

Diffing the rendered HTML shows that there is some small amount of
whitespace and comments added. That shouldn't be a problem.

We could perhaps let the implementation verify that the "action" is one
of "include" and "exclude". For the Asciidoctor implementation that
should be straightforward, but for AsciiDoc I don't immediately know how
to do it. Anyway, if someone stumbles on the keyboard and writes
"bugreport:icndule", they'll "only" miss out on the config key being
included in the whitelist. If this were a blacklist, the consequences of
a misspelled target could be a lot more severe.

Disclaimer: This is the outcome of a combination of copy-paste and
trial and error -- better solutions might be possible...

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I suppose this could also be "annotate:bugreport[ignore]" to avoid the
 "[x]" and to place all of this under a more general "annotate" macro.

 Documentation/asciidoc.conf             |  8 +++
 Documentation/asciidoctor-extensions.rb |  8 +++
 Documentation/config/sendemail.txt      | 68 ++++++++++++-------------
 bugreport-generate-config-whitelist.sh  |  2 +-
 4 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/Documentation/asciidoc.conf b/Documentation/asciidoc.conf
index 2c16c536ba..5a3c5ef028 100644
--- a/Documentation/asciidoc.conf
+++ b/Documentation/asciidoc.conf
@@ -6,9 +6,13 @@
 #
 # Show Git link as: <command>(<section>); if section is defined, else just show
 # the command.
+#
+# The bugreport macro does nothing as far as rendering is
+# concerned -- we just grep for it in the sources.
 
 [macros]
 (?su)[\\]?(?P<name>linkgit):(?P<target>\S*?)\[(?P<attrlist>.*?)\]=
+(?su)[\\]?(?P<name>bugreport):(?P<action>\S*?)\[(?P<attrlist>.*?)\]=
 
 [attributes]
 asterisk=&#42;
@@ -28,6 +32,8 @@ ifdef::backend-docbook[]
 {0#<citerefentry>}
 {0#<refentrytitle>{target}</refentrytitle><manvolnum>{0}</manvolnum>}
 {0#</citerefentry>}
+[bugreport-inlinemacro]
+{0#}
 endif::backend-docbook[]
 
 ifdef::backend-docbook[]
@@ -94,4 +100,6 @@ ifdef::backend-xhtml11[]
 git-relative-html-prefix=
 [linkgit-inlinemacro]
 <a href="{git-relative-html-prefix}{target}.html">{target}{0?({0})}</a>
+[bugreport-inlinemacro]
+<!-- -->
 endif::backend-xhtml11[]
diff --git a/Documentation/asciidoctor-extensions.rb b/Documentation/asciidoctor-extensions.rb
index 0089e0cfb8..54cbd5ddaf 100644
--- a/Documentation/asciidoctor-extensions.rb
+++ b/Documentation/asciidoctor-extensions.rb
@@ -20,9 +20,17 @@ module Git
         end
       end
     end
+    class BugReportProcessor < Asciidoctor::Extensions::InlineMacroProcessor
+      def process(parent, action, attrs)
+        ""
+      end
+    end
   end
 end
 
 Asciidoctor::Extensions.register do
   inline_macro Git::Documentation::LinkGitProcessor, :linkgit
+  # The bugreport macro does nothing as far as rendering is
+  # concerned -- we just grep for it in the sources.
+  inline_macro Git::Documentation::BugReportProcessor, :bugreport
 end
diff --git a/Documentation/config/sendemail.txt b/Documentation/config/sendemail.txt
index 69f3e4f219..92f5082013 100644
--- a/Documentation/config/sendemail.txt
+++ b/Documentation/config/sendemail.txt
@@ -1,63 +1,63 @@
-sendemail.identity:: // bugreport-exclude
+sendemail.identity bugreport:exclude[x] ::
 	A configuration identity. When given, causes values in the
 	'sendemail.<identity>' subsection to take precedence over
 	values in the 'sendemail' section. The default identity is
 	the value of `sendemail.identity`.
 
-sendemail.smtpEncryption:: // bugreport-include
+sendemail.smtpEncryption bugreport:include[x] ::
 	See linkgit:git-send-email[1] for description.  Note that this
 	setting is not subject to the 'identity' mechanism.
 
-sendemail.smtpssl (deprecated):: // bugreport-exclude
+sendemail.smtpssl (deprecated) bugreport:exclude[x] ::
 	Deprecated alias for 'sendemail.smtpEncryption = ssl'.
 
-sendemail.smtpsslcertpath:: // bugreport-exclude
+sendemail.smtpsslcertpath bugreport:exclude[x] ::
 	Path to ca-certificates (either a directory or a single file).
 	Set it to an empty string to disable certificate verification.
 
-sendemail.<identity>.*:: // bugreport-exclude
+sendemail.<identity>.* bugreport:exclude[x] ::
 	Identity-specific versions of the 'sendemail.*' parameters
 	found below, taking precedence over those when this
 	identity is selected, through either the command-line or
 	`sendemail.identity`.
 
-sendemail.aliasesFile:: // bugreport-exclude
-sendemail.aliasFileType:: // bugreport-exclude
-sendemail.annotate:: // bugreport-include
-sendemail.bcc:: // bugreport-include
-sendemail.cc:: // bugreport-include
-sendemail.ccCmd:: // bugreport-include
-sendemail.chainReplyTo:: // bugreport-include
-sendemail.confirm:: // bugreport-include
-sendemail.envelopeSender:: // bugreport-include
-sendemail.from:: // bugreport-include
-sendemail.multiEdit:: // bugreport-include
-sendemail.signedoffbycc:: // bugreport-include
-sendemail.smtpPass:: // bugreport-exclude
-sendemail.suppresscc:: // bugreport-include
-sendemail.suppressFrom:: // bugreport-include
-sendemail.to:: // bugreport-include
-sendemail.tocmd:: // bugreport-include
-sendemail.smtpDomain:: // bugreport-include
-sendemail.smtpServer:: // bugreport-include
-sendemail.smtpServerPort:: // bugreport-include
-sendemail.smtpServerOption:: // bugreport-include
-sendemail.smtpUser:: // bugreport-exclude
-sendemail.thread:: // bugreport-include
-sendemail.transferEncoding:: // bugreport-include
-sendemail.validate:: // bugreport-include
-sendemail.xmailer:: // bugreport-include
+sendemail.aliasesFile bugreport:exclude[x] ::
+sendemail.aliasFileType bugreport:exclude[x] ::
+sendemail.annotate bugreport:include[x] ::
+sendemail.bcc bugreport:include[x] ::
+sendemail.cc bugreport:include[x] ::
+sendemail.ccCmd bugreport:include[x] ::
+sendemail.chainReplyTo bugreport:include[x] ::
+sendemail.confirm bugreport:include[x] ::
+sendemail.envelopeSender bugreport:include[x] ::
+sendemail.from bugreport:include[x] ::
+sendemail.multiEdit bugreport:include[x] ::
+sendemail.signedoffbycc bugreport:include[x] ::
+sendemail.smtpPass bugreport:exclude[x] ::
+sendemail.suppresscc bugreport:include[x] ::
+sendemail.suppressFrom bugreport:include[x] ::
+sendemail.to bugreport:include[x] ::
+sendemail.tocmd bugreport:include[x] ::
+sendemail.smtpDomain bugreport:include[x] ::
+sendemail.smtpServer bugreport:include[x] ::
+sendemail.smtpServerPort bugreport:include[x] ::
+sendemail.smtpServerOption bugreport:include[x] ::
+sendemail.smtpUser bugreport:exclude[x] ::
+sendemail.thread bugreport:include[x] ::
+sendemail.transferEncoding bugreport:include[x] ::
+sendemail.validate bugreport:include[x] ::
+sendemail.xmailer bugreport:include[x] ::
 	See linkgit:git-send-email[1] for description.
 
-sendemail.signedoffcc (deprecated):: // bugreport-exclude
+sendemail.signedoffcc (deprecated) bugreport:exclude[x] ::
 	Deprecated alias for `sendemail.signedoffbycc`.
 
-sendemail.smtpBatchSize:: // bugreport-include
+sendemail.smtpBatchSize bugreport:include[x] ::
 	Number of messages to be sent per connection, after that a relogin
 	will happen.  If the value is 0 or undefined, send all messages in
 	one connection.
 	See also the `--batch-size` option of linkgit:git-send-email[1].
 
-sendemail.smtpReloginDelay:: // bugreport-include
+sendemail.smtpReloginDelay bugreport:include[x] ::
 	Seconds wait before reconnecting to smtp server.
 	See also the `--relogin-delay` option of linkgit:git-send-email[1].
diff --git a/bugreport-generate-config-whitelist.sh b/bugreport-generate-config-whitelist.sh
index ca6b232024..896b838cfa 100755
--- a/bugreport-generate-config-whitelist.sh
+++ b/bugreport-generate-config-whitelist.sh
@@ -1,4 +1,4 @@
 #!/bin/sh
 
-grep -RhPo ".*(?=:: \/\/ bugreport-include)" Documentation/config \
+grep -RhPo ".*(?= +bugreport:include.* ::)" Documentation/config \
   >git-bugreport-config-whitelist
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] bugreport: generate config whitelist based on docs
  2019-08-17 20:38     ` Martin Ågren
@ 2019-08-21 17:40       ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-08-21 17:40 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Sat, Aug 17, 2019 at 10:38:46PM +0200, Martin Ågren wrote:
> On Sat, 17 Aug 2019 at 02:42, Emily Shaffer <emilyshaffer@google.com> wrote:
> >
> > Add a new step to the build to generate a whitelist of git-config
> > variables which are appropriate to include in the output of
> > git-bugreport. New variables can be added to the whitelist by annotating
> > their documentation in Documentation/config with the line
> > "// bugreport-include".
> 
> These "// bugreport-include" show up in the rendered manpages, both with
> AsciiDoc and Asciidoctor. :-(

Hmm, I don't see it in the troff or the HTML with asciidoc using `make`
- but I do see with asciidoctor, or `asciidoc -d html5`. Nice catch,
thanks.

> 
> > --- a/Documentation/config/sendemail.txt
> > +++ b/Documentation/config/sendemail.txt
> > @@ -1,63 +1,63 @@
> > -sendemail.identity::
> > +sendemail.identity:: // bugreport-exclude
> >         A configuration identity. When given, causes values in the
> 
> If I put each comment on a line of its own (after the config option, but
> I suppose before would work the same way), Asciidoctor truly ignores
> them and everything's fine. And AsciiDoc renders this one and others
> like it ok.
> 
> > -sendemail.aliasesFile::
> > -sendemail.aliasFileType::
> > -sendemail.annotate::
> > +sendemail.aliasesFile:: // bugreport-exclude
> > +sendemail.aliasFileType:: // bugreport-exclude
> > +sendemail.annotate:: // bugreport-include
> 
> However, AsciiDoc (version 8.6.10) seems to effectively replace those
> comments with an empty line during processing, and it makes quite the
> difference here. Instead of these appearing in a compact comma-separated
> list, they are treated as individual items in the description list with
> no supporting content.
> 
> FWIW, I like the idea of annotating things here to make it harder to
> forget this whitelisting when adding a new config item.
> 
> Below is what I came up with as an alternative approach. Feel free to
> steal, squash and/or ignore as you see fit.

This is cool - I'll add this to the commit chain and build on top of it.
Thanks!

> 
> BTW, I'm not sure the grep invocation is portable (-R ? -h ? -P ? -o ?).

Yeah, I'll see what I can do about that (recursive, no filename,
Perl-compatible regex, match only) - I can probably replace this with
something like `find | xargs pgrep` - I'll keep digging. Thanks.

> We should probably also do the usual "create a candidate output file,
> then move it into place" dance for robustness.
> 
> I do think we should test the generated whitelist in some minimal way,
> e.g., to check that it does contain something which objectively belongs
> in the whitelist and -- more importantly IMHO -- does *not* contain
> something that shouldn't be there, such as sendemail.smtpPass.

Good point, I'll add a test for this.

Thanks very much for this!
 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 0/9] add git-bugreport tool
  2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
  2019-08-17  0:39   ` [PATCH v2 1/2] bugreport: add tool to generate debugging info Emily Shaffer
  2019-08-17  0:39   ` [PATCH v2 2/2] bugreport: generate config whitelist based on docs Emily Shaffer
@ 2019-10-25  2:51   ` Emily Shaffer
  2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
                       ` (9 more replies)
  2 siblings, 10 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Derrick Stolee, Johannes Schindelin, Junio C Hamano

Thanks for the patience with the long wait, all. Here's an attempt at
the rewrite in C; I think it does verbatim what the sh version did
except that this doesn't print the reflog - Jonathan Nieder was good
enough to point out to me that folks probably don't want to share their
commit subjects all willy-nilly if they're working on something
secretive.

For the most part I hoped to do this stuff stupidly - that is to say,
independent of the libraries within Git - in an effort to both avoid
bitrot, and see what is (as opposed to what the libraries think should
be).

The one common patch with v2 is "generate config whitelist based on
docs", which I have not changed at all.

I look forward to everyone's suggestions.

 - Emily

Emily Shaffer (9):
  bugreport: add tool to generate debugging info
  bugreport: generate config whitelist based on docs
  bugreport: add version and system information
  bugreport: add config values from whitelist
  bugreport: collect list of populated hooks
  bugreport: count loose objects
  bugreport: add packed object summary
  bugreport: list contents of $OBJDIR/info
  bugreport: print contents of alternates file

 .gitignore                             |   1 +
 Documentation/config/sendemail.txt     |  68 +++---
 Makefile                               |  11 +-
 bugreport-generate-config-whitelist.sh |   4 +
 bugreport.c                            | 314 +++++++++++++++++++++++++
 bugreport.h                            |  44 ++++
 builtin.h                              |   1 +
 builtin/bugreport.c                    |  90 +++++++
 git.c                                  |   1 +
 9 files changed, 499 insertions(+), 35 deletions(-)
 create mode 100755 bugreport-generate-config-whitelist.sh
 create mode 100644 bugreport.c
 create mode 100644 bugreport.h
 create mode 100644 builtin/bugreport.c

-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-29 20:29       ` Josh Steadmon
  2019-11-16  3:11       ` Junio C Hamano
  2019-10-25  2:51     ` [PATCH v3 2/9] bugreport: generate config whitelist based on docs Emily Shaffer
                       ` (8 subsequent siblings)
  9 siblings, 2 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Teach Git how to prompt the user for a good bug report: reproduction
steps, expected behavior, and actual behavior. Later, Git can learn how
to collect some diagnostic information from the repository.

If users can send us a well-written bug report which contains diagnostic
information we would otherwise need to ask the user for, we can reduce
the number of question-and-answer round trips between the reporter and
the Git contributor.

Users may also wish to send a report like this to their local "Git
expert" if they have put their repository into a state they are confused
by.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Makefile            |  1 +
 builtin.h           |  1 +
 builtin/bugreport.c | 50 +++++++++++++++++++++++++++++++++++++++++++++
 git.c               |  1 +
 4 files changed, 53 insertions(+)
 create mode 100644 builtin/bugreport.c

diff --git a/Makefile b/Makefile
index 58b92af54b..132e2a52da 100644
--- a/Makefile
+++ b/Makefile
@@ -1039,6 +1039,7 @@ BUILTIN_OBJS += builtin/archive.o
 BUILTIN_OBJS += builtin/bisect--helper.o
 BUILTIN_OBJS += builtin/blame.o
 BUILTIN_OBJS += builtin/branch.o
+BUILTIN_OBJS += builtin/bugreport.o
 BUILTIN_OBJS += builtin/bundle.o
 BUILTIN_OBJS += builtin/cat-file.o
 BUILTIN_OBJS += builtin/check-attr.o
diff --git a/builtin.h b/builtin.h
index 5cf5df69f7..c6373d3289 100644
--- a/builtin.h
+++ b/builtin.h
@@ -135,6 +135,7 @@ int cmd_archive(int argc, const char **argv, const char *prefix);
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix);
 int cmd_blame(int argc, const char **argv, const char *prefix);
 int cmd_branch(int argc, const char **argv, const char *prefix);
+int cmd_bugreport(int argc, const char **argv, const char *prefix);
 int cmd_bundle(int argc, const char **argv, const char *prefix);
 int cmd_cat_file(int argc, const char **argv, const char *prefix);
 int cmd_checkout(int argc, const char **argv, const char *prefix);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
new file mode 100644
index 0000000000..2ef16440a0
--- /dev/null
+++ b/builtin/bugreport.c
@@ -0,0 +1,50 @@
+#include "builtin.h"
+#include "stdio.h"
+#include "strbuf.h"
+#include "time.h"
+
+int get_bug_template(struct strbuf *template)
+{
+	const char template_text[] =
+"Thank you for filling out a Git bug report!\n"
+"Please answer the following questions to help us understand your issue.\n"
+"\n"
+"What did you do before the bug happened? (Steps to reproduce your issue)\n"
+"\n"
+"What did you expect to happen? (Expected behavior)\n"
+"\n"
+"What happened instead? (Actual behavior)\n"
+"\n"
+"What's different between what you expected and what actually happened?\n"
+"\n"
+"Anything else you want to add:\n"
+"\n"
+"Please review the rest of the bug report below.\n"
+"You can delete any lines you don't wish to send.\n";
+
+	strbuf_reset(template);
+	strbuf_add(template, template_text, strlen(template_text));
+	return 0;
+}
+
+int cmd_bugreport(int argc, const char **argv, const char *prefix)
+{
+	struct strbuf buffer = STRBUF_INIT;
+	struct strbuf report_path = STRBUF_INIT;
+	FILE *report;
+	time_t now = time(NULL);
+
+	strbuf_addstr(&report_path, "git-bugreport-");
+	strbuf_addftime(&report_path, "%F", gmtime(&now), 0, 0);
+	strbuf_addstr(&report_path, ".txt");
+
+	report = fopen_for_writing(report_path.buf);
+
+	get_bug_template(&buffer);
+	strbuf_write(&buffer, report);
+
+	fclose(report);
+
+	launch_editor(report_path.buf, NULL, NULL);
+	return 0;
+}
diff --git a/git.c b/git.c
index ce6ab0ece2..2d6a64f019 100644
--- a/git.c
+++ b/git.c
@@ -473,6 +473,7 @@ static struct cmd_struct commands[] = {
 	{ "bisect--helper", cmd_bisect__helper, RUN_SETUP },
 	{ "blame", cmd_blame, RUN_SETUP },
 	{ "branch", cmd_branch, RUN_SETUP | DELAY_PAGER_CONFIG },
+	{ "bugreport", cmd_bugreport, RUN_SETUP },
 	{ "bundle", cmd_bundle, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "cat-file", cmd_cat_file, RUN_SETUP },
 	{ "check-attr", cmd_check_attr, RUN_SETUP },
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 2/9] bugreport: generate config whitelist based on docs
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
  2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 13:27       ` Johannes Schindelin
  2019-10-25  2:51     ` [PATCH v3 3/9] bugreport: add version and system information Emily Shaffer
                       ` (7 subsequent siblings)
  9 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Add a new step to the build to generate a whitelist of git-config
variables which are appropriate to include in the output of
git-bugreport. New variables can be added to the whitelist by annotating
their documentation in Documentation/config with the line
"// bugreport-include".

Some configs are private in nature, and can contain remote URLs,
passwords, or other sensitive information. In the event that a user
doesn't notice their information while reviewing a bugreport, that user
may leak their credentials to other individuals, mailing lists, or bug
tracking tools inadvertently. Heuristic blacklisting of configuration
keys is imperfect and prone to false negatives; given the nature of the
information which can be leaked, a whitelist is more reliable.

In order to prevent staleness of the whitelist, add a mechanism to
generate the whitelist from annotations in the config documentation,
where contributors are already used to documenting their new config
keys.

Additionally, add annotations to the sendemail config documentation in
order to demonstrate a proof of concept.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                             |  1 +
 Documentation/config/sendemail.txt     | 68 +++++++++++++-------------
 Makefile                               |  9 +++-
 bugreport-generate-config-whitelist.sh |  4 ++
 4 files changed, 47 insertions(+), 35 deletions(-)
 create mode 100755 bugreport-generate-config-whitelist.sh

diff --git a/.gitignore b/.gitignore
index 89b3b79c1a..055a84c4a8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -25,6 +25,7 @@
 /git-bisect--helper
 /git-blame
 /git-branch
+/git-bugreport
 /git-bundle
 /git-cat-file
 /git-check-attr
diff --git a/Documentation/config/sendemail.txt b/Documentation/config/sendemail.txt
index 0006faf800..69f3e4f219 100644
--- a/Documentation/config/sendemail.txt
+++ b/Documentation/config/sendemail.txt
@@ -1,63 +1,63 @@
-sendemail.identity::
+sendemail.identity:: // bugreport-exclude
 	A configuration identity. When given, causes values in the
 	'sendemail.<identity>' subsection to take precedence over
 	values in the 'sendemail' section. The default identity is
 	the value of `sendemail.identity`.
 
-sendemail.smtpEncryption::
+sendemail.smtpEncryption:: // bugreport-include
 	See linkgit:git-send-email[1] for description.  Note that this
 	setting is not subject to the 'identity' mechanism.
 
-sendemail.smtpssl (deprecated)::
+sendemail.smtpssl (deprecated):: // bugreport-exclude
 	Deprecated alias for 'sendemail.smtpEncryption = ssl'.
 
-sendemail.smtpsslcertpath::
+sendemail.smtpsslcertpath:: // bugreport-exclude
 	Path to ca-certificates (either a directory or a single file).
 	Set it to an empty string to disable certificate verification.
 
-sendemail.<identity>.*::
+sendemail.<identity>.*:: // bugreport-exclude
 	Identity-specific versions of the 'sendemail.*' parameters
 	found below, taking precedence over those when this
 	identity is selected, through either the command-line or
 	`sendemail.identity`.
 
-sendemail.aliasesFile::
-sendemail.aliasFileType::
-sendemail.annotate::
-sendemail.bcc::
-sendemail.cc::
-sendemail.ccCmd::
-sendemail.chainReplyTo::
-sendemail.confirm::
-sendemail.envelopeSender::
-sendemail.from::
-sendemail.multiEdit::
-sendemail.signedoffbycc::
-sendemail.smtpPass::
-sendemail.suppresscc::
-sendemail.suppressFrom::
-sendemail.to::
-sendemail.tocmd::
-sendemail.smtpDomain::
-sendemail.smtpServer::
-sendemail.smtpServerPort::
-sendemail.smtpServerOption::
-sendemail.smtpUser::
-sendemail.thread::
-sendemail.transferEncoding::
-sendemail.validate::
-sendemail.xmailer::
+sendemail.aliasesFile:: // bugreport-exclude
+sendemail.aliasFileType:: // bugreport-exclude
+sendemail.annotate:: // bugreport-include
+sendemail.bcc:: // bugreport-include
+sendemail.cc:: // bugreport-include
+sendemail.ccCmd:: // bugreport-include
+sendemail.chainReplyTo:: // bugreport-include
+sendemail.confirm:: // bugreport-include
+sendemail.envelopeSender:: // bugreport-include
+sendemail.from:: // bugreport-include
+sendemail.multiEdit:: // bugreport-include
+sendemail.signedoffbycc:: // bugreport-include
+sendemail.smtpPass:: // bugreport-exclude
+sendemail.suppresscc:: // bugreport-include
+sendemail.suppressFrom:: // bugreport-include
+sendemail.to:: // bugreport-include
+sendemail.tocmd:: // bugreport-include
+sendemail.smtpDomain:: // bugreport-include
+sendemail.smtpServer:: // bugreport-include
+sendemail.smtpServerPort:: // bugreport-include
+sendemail.smtpServerOption:: // bugreport-include
+sendemail.smtpUser:: // bugreport-exclude
+sendemail.thread:: // bugreport-include
+sendemail.transferEncoding:: // bugreport-include
+sendemail.validate:: // bugreport-include
+sendemail.xmailer:: // bugreport-include
 	See linkgit:git-send-email[1] for description.
 
-sendemail.signedoffcc (deprecated)::
+sendemail.signedoffcc (deprecated):: // bugreport-exclude
 	Deprecated alias for `sendemail.signedoffbycc`.
 
-sendemail.smtpBatchSize::
+sendemail.smtpBatchSize:: // bugreport-include
 	Number of messages to be sent per connection, after that a relogin
 	will happen.  If the value is 0 or undefined, send all messages in
 	one connection.
 	See also the `--batch-size` option of linkgit:git-send-email[1].
 
-sendemail.smtpReloginDelay::
+sendemail.smtpReloginDelay:: // bugreport-include
 	Seconds wait before reconnecting to smtp server.
 	See also the `--relogin-delay` option of linkgit:git-send-email[1].
diff --git a/Makefile b/Makefile
index 132e2a52da..78767ecdde 100644
--- a/Makefile
+++ b/Makefile
@@ -634,6 +634,10 @@ SCRIPT_PYTHON += git-p4.py
 SCRIPT_SH_GEN = $(patsubst %.sh,%,$(SCRIPT_SH))
 SCRIPT_PERL_GEN = $(patsubst %.perl,%,$(SCRIPT_PERL))
 SCRIPT_PYTHON_GEN = $(patsubst %.py,%,$(SCRIPT_PYTHON))
+SCRIPT_DEPENDENCIES = git-bugreport-config-whitelist
+
+$(SCRIPT_DEPENDENCIES): Documentation/config/*.txt
+	sh bugreport-generate-config-whitelist.sh
 
 # Individual rules to allow e.g.
 # "make -C ../.. SCRIPT_PERL=contrib/foo/bar.perl build-perl-script"
@@ -651,17 +655,20 @@ install-perl-script: $(SCRIPT_PERL_GEN)
 install-python-script: $(SCRIPT_PYTHON_GEN)
 	$(INSTALL) $^ '$(DESTDIR_SQ)$(gitexec_instdir_SQ)'
 
-.PHONY: clean-perl-script clean-sh-script clean-python-script
+.PHONY: clean-perl-script clean-sh-script clean-python-script clean-script-dependencies
 clean-sh-script:
 	$(RM) $(SCRIPT_SH_GEN)
 clean-perl-script:
 	$(RM) $(SCRIPT_PERL_GEN)
 clean-python-script:
 	$(RM) $(SCRIPT_PYTHON_GEN)
+clean-script-dependencies:
+	$(RM) $(SCRIPT_DEPENDENCIES)
 
 SCRIPTS = $(SCRIPT_SH_GEN) \
 	  $(SCRIPT_PERL_GEN) \
 	  $(SCRIPT_PYTHON_GEN) \
+	  $(SCRIPT_DEPENDENCIES) \
 	  git-instaweb
 
 ETAGS_TARGET = TAGS
diff --git a/bugreport-generate-config-whitelist.sh b/bugreport-generate-config-whitelist.sh
new file mode 100755
index 0000000000..ca6b232024
--- /dev/null
+++ b/bugreport-generate-config-whitelist.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+grep -RhPo ".*(?=:: \/\/ bugreport-include)" Documentation/config \
+  >git-bugreport-config-whitelist
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 3/9] bugreport: add version and system information
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
  2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
  2019-10-25  2:51     ` [PATCH v3 2/9] bugreport: generate config whitelist based on docs Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 13:49       ` Johannes Schindelin
  2019-10-29 20:43       ` Josh Steadmon
  2019-10-25  2:51     ` [PATCH v3 4/9] bugreport: add config values from whitelist Emily Shaffer
                       ` (6 subsequent siblings)
  9 siblings, 2 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Teach bugreport how to collect the Git, curl, and ldd versions, as well
as the uname string.

Learning the uname and versions in the user's environment will enable us
to reproduce behavior reported by the user; the versions will also give
us a good starting place while bisecting a newly introduced bug.

Put this functionality in a library, so that other commands can easily
include this debug info if necessary.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Makefile            |  1 +
 bugreport.c         | 55 +++++++++++++++++++++++++++++++++++++++++++++
 bugreport.h         |  7 ++++++
 builtin/bugreport.c | 13 +++++++++++
 4 files changed, 76 insertions(+)
 create mode 100644 bugreport.c
 create mode 100644 bugreport.h

diff --git a/Makefile b/Makefile
index 78767ecdde..045b22cb67 100644
--- a/Makefile
+++ b/Makefile
@@ -844,6 +844,7 @@ LIB_OBJS += bisect.o
 LIB_OBJS += blame.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
+LIB_OBJS += bugreport.o
 LIB_OBJS += bulk-checkin.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
diff --git a/bugreport.c b/bugreport.c
new file mode 100644
index 0000000000..ada54fe583
--- /dev/null
+++ b/bugreport.c
@@ -0,0 +1,55 @@
+#include "cache.h"
+
+#include "bugreport.h"
+#include "help.h"
+#include "run-command.h"
+#include "version.h"
+
+void get_system_info(struct strbuf *sys_info)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf std_out = STRBUF_INIT;
+
+	strbuf_reset(sys_info);
+
+	// get git version from native cmd
+	strbuf_addstr(sys_info, "git version: ");
+	strbuf_addstr(sys_info, git_version_string);
+	strbuf_complete_line(sys_info);
+
+	// system call for other version info
+	argv_array_push(&cp.args, "uname");
+	argv_array_push(&cp.args, "-a");
+	capture_command(&cp, &std_out, 0);
+
+	strbuf_addstr(sys_info, "uname -a: ");
+	strbuf_addbuf(sys_info, &std_out);
+	strbuf_complete_line(sys_info);
+
+	argv_array_clear(&cp.args);
+	strbuf_reset(&std_out);
+
+
+	argv_array_push(&cp.args, "curl-config");
+	argv_array_push(&cp.args, "--version");
+	capture_command(&cp, &std_out, 0);
+
+	strbuf_addstr(sys_info, "curl-config --version: ");
+	strbuf_addbuf(sys_info, &std_out);
+	strbuf_complete_line(sys_info);
+
+	argv_array_clear(&cp.args);
+	strbuf_reset(&std_out);
+
+
+	argv_array_push(&cp.args, "ldd");
+	argv_array_push(&cp.args, "--version");
+	capture_command(&cp, &std_out, 0);
+
+	strbuf_addstr(sys_info, "ldd --version: ");
+	strbuf_addbuf(sys_info, &std_out);
+	strbuf_complete_line(sys_info);
+
+	argv_array_clear(&cp.args);
+	strbuf_reset(&std_out);
+}
diff --git a/bugreport.h b/bugreport.h
new file mode 100644
index 0000000000..ba216acf3f
--- /dev/null
+++ b/bugreport.h
@@ -0,0 +1,7 @@
+#include "strbuf.h"
+
+/**
+ * Adds the Git version, `uname -a`, and `curl-config --version` to sys_info.
+ * The previous contents of sys_info will be discarded.
+ */
+void get_system_info(struct strbuf *sys_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 2ef16440a0..7232d31be7 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -1,4 +1,5 @@
 #include "builtin.h"
+#include "bugreport.h"
 #include "stdio.h"
 #include "strbuf.h"
 #include "time.h"
@@ -27,6 +28,13 @@ int get_bug_template(struct strbuf *template)
 	return 0;
 }
 
+void add_header(FILE *report, const char *title)
+{
+	struct strbuf buffer = STRBUF_INIT;
+	strbuf_addf(&buffer, "\n\n[%s]\n", title);
+	strbuf_write(&buffer, report);
+}
+
 int cmd_bugreport(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf buffer = STRBUF_INIT;
@@ -43,6 +51,11 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_bug_template(&buffer);
 	strbuf_write(&buffer, report);
 
+	// add other contents
+	add_header(report, "System Info");
+	get_system_info(&buffer);
+	strbuf_write(&buffer, report);
+
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (2 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 3/9] bugreport: add version and system information Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 14:14       ` Johannes Schindelin
  2019-10-29 20:58       ` Josh Steadmon
  2019-10-25  2:51     ` [PATCH v3 5/9] bugreport: collect list of populated hooks Emily Shaffer
                       ` (5 subsequent siblings)
  9 siblings, 2 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Teach bugreport to gather the values of config options which are present
in 'git-bugreport-config-whitelist'.

Many config options are sensitive, and many Git add-ons use config
options which git-core does not know about; it is better only to gather
config options which we know to be safe, rather than excluding options
which we know to be unsafe.

Reading the whitelist into memory and sorting it saves us time -
since git_config_bugreport() is called for every option the user has
configured, limiting the file IO to one open/read/close and performing
option lookup in sublinear time is a useful optimization.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 50 +++++++++++++++++++++++++++++++++++++++++++++
 bugreport.h         |  7 +++++++
 builtin/bugreport.c |  4 ++++
 3 files changed, 61 insertions(+)

diff --git a/bugreport.c b/bugreport.c
index ada54fe583..afa4836ab1 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -1,10 +1,24 @@
 #include "cache.h"
 
 #include "bugreport.h"
+#include "config.h"
+#include "exec-cmd.h"
 #include "help.h"
 #include "run-command.h"
 #include "version.h"
 
+/**
+ * A sorted list of config options which we will add to the bugreport. Managed
+ * by 'gather_whitelist(...)'.
+ */
+struct string_list whitelist = STRING_LIST_INIT_DUP;
+struct strbuf configs_and_values = STRBUF_INIT;
+
+// git version --build-options
+// uname -a
+// curl-config --version
+// ldd --version
+// echo $SHELL
 void get_system_info(struct strbuf *sys_info)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
@@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
 	argv_array_clear(&cp.args);
 	strbuf_reset(&std_out);
 }
+
+void gather_whitelist(struct strbuf *path)
+{
+	struct strbuf tmp = STRBUF_INIT;
+	strbuf_read_file(&tmp, path->buf, 0);
+	string_list_init(&whitelist, 1);
+	string_list_split(&whitelist, tmp.buf, '\n', -1);
+	string_list_sort(&whitelist);
+}
+
+int git_config_bugreport(const char *var, const char *value, void *cb)
+{
+	if (string_list_has_string(&whitelist, var)) {
+		strbuf_addf(&configs_and_values,
+			    "%s : %s\n",
+			    var, value);
+	}
+
+	return 0;
+}
+
+void get_whitelisted_config(struct strbuf *config_info)
+{
+	struct strbuf path = STRBUF_INIT;
+
+	strbuf_addstr(&path, git_exec_path());
+	strbuf_addstr(&path, "/git-bugreport-config-whitelist");
+
+	gather_whitelist(&path);
+	strbuf_init(&configs_and_values, whitelist.nr);
+
+	git_config(git_config_bugreport, NULL);
+
+	strbuf_reset(config_info);
+	strbuf_addbuf(config_info, &configs_and_values);
+}
diff --git a/bugreport.h b/bugreport.h
index ba216acf3f..7413e7e1be 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -5,3 +5,10 @@
  * The previous contents of sys_info will be discarded.
  */
 void get_system_info(struct strbuf *sys_info);
+
+/**
+ * Adds the values of the config items listed in
+ * 'git-bugreport-config-whitelist' to config_info. The previous contents of
+ * config_info will be discarded.
+ */
+void get_whitelisted_config(struct strbuf *sys_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 7232d31be7..70fe0d2b85 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_system_info(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Whitelisted Config");
+	get_whitelisted_config(&buffer);
+	strbuf_write(&buffer, report);
+
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 5/9] bugreport: collect list of populated hooks
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (3 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 4/9] bugreport: add config values from whitelist Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 14:31       ` Johannes Schindelin
  2019-10-25  2:51     ` [PATCH v3 6/9] bugreport: count loose objects Emily Shaffer
                       ` (4 subsequent siblings)
  9 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Occasionally a failure a user is seeing may be related to a specific
hook which is being run, perhaps without the user realizing. While the
contents of hooks can be sensitive - containing user data or process
information specific to the user's organization - simply knowing that a
hook is being run at a certain stage can help us to understand whether
something is going wrong.

Without a definitive list of hook names within the code, we compile our
own list from the documentation. This is likely prone to bitrot. To
reduce the amount of code humans need to read, we turn the list into a
string_list and iterate over it (as we are calling the same find_hook
operation on each string). However, since bugreport should primarily be
called by the user, the performance loss from massaging the string
seems acceptable.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
 bugreport.h         |  6 ++++++
 builtin/bugreport.c |  4 ++++
 3 files changed, 54 insertions(+)

diff --git a/bugreport.c b/bugreport.c
index afa4836ab1..9d7f44ff28 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -103,3 +103,47 @@ void get_whitelisted_config(struct strbuf *config_info)
 	strbuf_reset(config_info);
 	strbuf_addbuf(config_info, &configs_and_values);
 }
+
+void get_populated_hooks(struct strbuf *hook_info)
+{
+	/*
+	 * Doesn't look like there is a list of all possible hooks; so below is
+	 * a transcription of `git help hook`.
+	 */
+	const char *hooks = "applypatch-msg,"
+			    "pre-applypatch,"
+			    "post-applypatch,"
+			    "pre-commit,"
+			    "pre-merge-commit,"
+			    "prepare-commit-msg,"
+			    "commit-msg,"
+			    "post-commit,"
+			    "pre-rebase,"
+			    "post-checkout,"
+			    "post-merge,"
+			    "pre-push,"
+			    "pre-receive,"
+			    "update,"
+			    "post-receive,"
+			    "post-update,"
+			    "push-to-checkout,"
+			    "pre-auto-gc,"
+			    "post-rewrite,"
+			    "sendemail-validate,"
+			    "fsmonitor-watchman,"
+			    "p4-pre-submit,"
+			    "post-index-changex";
+	struct string_list hooks_list = STRING_LIST_INIT_DUP;
+	struct string_list_item *iter = NULL;
+
+	strbuf_reset(hook_info);
+
+	string_list_split(&hooks_list, hooks, ',', -1);
+
+	for_each_string_list_item(iter, &hooks_list) {
+		if (find_hook(iter->string)) {
+			strbuf_addstr(hook_info, iter->string);
+			strbuf_complete_line(hook_info);
+		}
+	}
+}
diff --git a/bugreport.h b/bugreport.h
index 7413e7e1be..942a5436e3 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -12,3 +12,9 @@ void get_system_info(struct strbuf *sys_info);
  * config_info will be discarded.
  */
 void get_whitelisted_config(struct strbuf *sys_info);
+
+/**
+ * Adds the paths to all configured hooks (but not their contents). The previous
+ * contents of hook_info will be discarded.
+ */
+void get_populated_hooks(struct strbuf *hook_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 70fe0d2b85..a0eefba498 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -60,6 +60,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_whitelisted_config(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Populated Hooks");
+	get_populated_hooks(&buffer);
+	strbuf_write(&buffer, report);
+
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 6/9] bugreport: count loose objects
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (4 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 5/9] bugreport: collect list of populated hooks Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 15:07       ` Johannes Schindelin
  2019-10-29 21:18       ` Josh Steadmon
  2019-10-25  2:51     ` [PATCH v3 7/9] bugreport: add packed object summary Emily Shaffer
                       ` (3 subsequent siblings)
  9 siblings, 2 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

The number of unpacked objects in a user's repository may help us
understand the root of the problem they're seeing, especially if a
command is running unusually slowly.

Rather than directly invoking 'git-count-objects', which may sometimes
fail unexpectedly on Git for Windows, manually count the contents of
.git/objects. Additionally, since we may wish to inspect other
directories' contents for bugreport in the future, put the directory
listing into a helper function.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 72 +++++++++++++++++++++++++++++++++++++++++++++
 bugreport.h         |  6 ++++
 builtin/bugreport.c |  4 +++
 3 files changed, 82 insertions(+)

diff --git a/bugreport.c b/bugreport.c
index 9d7f44ff28..54e1d47103 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -5,8 +5,11 @@
 #include "exec-cmd.h"
 #include "help.h"
 #include "run-command.h"
+#include "strbuf.h"
 #include "version.h"
 
+#include "dirent.h"
+
 /**
  * A sorted list of config options which we will add to the bugreport. Managed
  * by 'gather_whitelist(...)'.
@@ -147,3 +150,72 @@ void get_populated_hooks(struct strbuf *hook_info)
 		}
 	}
 }
+
+/**
+ * Fill 'contents' with the contents of the dir at 'dirpath'.
+ * If 'filter' is nonzero, the contents are filtered on d_type as 'type' - see
+ * 'man readdir'. opendir() doesn't take string length as an arg, so don't
+ * bother passing it in.
+ */
+void list_contents_of_dir(struct string_list *contents, struct strbuf *dirpath,
+			  int filter, unsigned char type)
+{
+	struct dirent *dir = NULL;
+	DIR *dh = NULL;
+
+	dh = opendir(dirpath->buf);
+	while (dh && (dir = readdir(dh))) {
+		if (!filter || type == dir->d_type) {
+			string_list_append(contents, dir->d_name);
+		}
+	}
+}
+
+
+void get_object_counts(struct strbuf *obj_info)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf std_out = STRBUF_INIT;
+
+	argv_array_push(&cp.args, "count-objects");
+	argv_array_push(&cp.args, "-vH");
+	cp.git_cmd = 1;
+	capture_command(&cp, &std_out, 0);
+
+	strbuf_reset(obj_info);
+	strbuf_addstr(obj_info, "git-count-objects -vH:\n");
+	strbuf_addbuf(obj_info, &std_out);
+}
+
+void get_loose_object_summary(struct strbuf *obj_info)
+{
+	struct strbuf dirpath = STRBUF_INIT;
+	struct string_list subdirs = STRING_LIST_INIT_DUP;
+	struct string_list_item *subdir;
+
+	strbuf_reset(obj_info);
+
+	strbuf_addstr(&dirpath, get_object_directory());
+	strbuf_complete(&dirpath, '/');
+
+	list_contents_of_dir(&subdirs, &dirpath, 1, DT_DIR);
+
+	for_each_string_list_item(subdir, &subdirs)
+	{
+		struct strbuf subdir_buf = STRBUF_INIT;
+		struct string_list objects = STRING_LIST_INIT_DUP;
+
+		/*
+		 * Only interested in loose objects - so dirs named with the
+		 * first byte of the object ID
+		 */
+		if (strlen(subdir->string) != 2 || !strcmp(subdir->string, ".."))
+			continue;
+
+		strbuf_addbuf(&subdir_buf, &dirpath);
+		strbuf_addstr(&subdir_buf, subdir->string);
+		list_contents_of_dir(&objects, &subdir_buf, 0, 0);
+		strbuf_addf(obj_info, "%s: %d objects\n", subdir->string,
+			    objects.nr);
+	}
+}
diff --git a/bugreport.h b/bugreport.h
index 942a5436e3..09ad0c2599 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -18,3 +18,9 @@ void get_whitelisted_config(struct strbuf *sys_info);
  * contents of hook_info will be discarded.
  */
 void get_populated_hooks(struct strbuf *hook_info);
+
+/**
+ * Adds the output of `git count-object -vH`. The previous contents of hook_info
+ * will be discarded.
+ */
+void get_loose_object_summary(struct strbuf *obj_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index a0eefba498..b2ab194207 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -64,6 +64,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_populated_hooks(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Object Counts");
+	get_loose_object_summary(&buffer);
+	strbuf_write(&buffer, report);
+
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 7/9] bugreport: add packed object summary
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (5 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 6/9] bugreport: count loose objects Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 15:43       ` Johannes Schindelin
  2019-10-25  2:51     ` [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info Emily Shaffer
                       ` (2 subsequent siblings)
  9 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Alongside the list of loose objects, it's useful to see the list of
object packs as well. It can help us to examine what Git did and did not
pack.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 21 +++++++++++++++++++++
 bugreport.h         |  6 ++++++
 builtin/bugreport.c |  4 ++++
 3 files changed, 31 insertions(+)

diff --git a/bugreport.c b/bugreport.c
index 54e1d47103..79ddb8baaa 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -219,3 +219,24 @@ void get_loose_object_summary(struct strbuf *obj_info)
 			    objects.nr);
 	}
 }
+
+void get_packed_object_summary(struct strbuf *obj_info)
+{
+	struct strbuf dirpath = STRBUF_INIT;
+	struct string_list contents = STRING_LIST_INIT_DUP;
+	struct string_list_item *entry;
+
+	strbuf_reset(obj_info);
+
+	strbuf_addstr(&dirpath, get_object_directory());
+	strbuf_complete(&dirpath, '/');
+	strbuf_addstr(&dirpath, "pack/");
+	list_contents_of_dir(&contents, &dirpath, 0, 0);
+
+	// list full contents of $GIT_OBJECT_DIRECTORY/pack/
+	for_each_string_list_item(entry, &contents) {
+		strbuf_addbuf(obj_info, &dirpath);
+		strbuf_addstr(obj_info, entry->string);
+		strbuf_complete_line(obj_info);
+	}
+}
diff --git a/bugreport.h b/bugreport.h
index 09ad0c2599..11ff7df41b 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -24,3 +24,9 @@ void get_populated_hooks(struct strbuf *hook_info);
  * will be discarded.
  */
 void get_loose_object_summary(struct strbuf *obj_info);
+
+/**
+ * Adds a list of the contents of '.git/objects/pack'. The previous contents of
+ * hook_info will be discarded.
+ */
+void get_packed_object_summary(struct strbuf *obj_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index b2ab194207..da91a3944e 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -68,6 +68,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_loose_object_summary(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Packed Object Summary");
+	get_packed_object_summary(&buffer);
+	strbuf_write(&buffer, report);
+
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (6 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 7/9] bugreport: add packed object summary Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 15:51       ` Johannes Schindelin
  2019-10-25  2:51     ` [PATCH v3 9/9] bugreport: print contents of alternates file Emily Shaffer
  2019-10-29  1:54     ` [PATCH v3 0/9] add git-bugreport tool Junio C Hamano
  9 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Miscellaneous information used about the object store can end up in
.git/objects/info; this can help us understand what may be going on with
the object store when the user is reporting a bug. Otherwise, it could
be difficult to track down what is going wrong with an object which
isn't kept locally to .git/objects/ or .git/objects/pack. Having some
understanding of where the user's objects may be kept can save us some
hops during the bug reporting process.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 58 +++++++++++++++++++++++++++++++++++++++++++++
 bugreport.h         |  6 +++++
 builtin/bugreport.c |  7 ++++++
 3 files changed, 71 insertions(+)

diff --git a/bugreport.c b/bugreport.c
index 79ddb8baaa..ce15904fec 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -6,6 +6,7 @@
 #include "help.h"
 #include "run-command.h"
 #include "strbuf.h"
+#include "string-list.h"
 #include "version.h"
 
 #include "dirent.h"
@@ -171,6 +172,41 @@ void list_contents_of_dir(struct string_list *contents, struct strbuf *dirpath,
 	}
 }
 
+/**
+ * Fills 'contents' with a list of all directories within the provided
+ * directory, recursing into each directory.
+ */
+void list_contents_of_dir_recursively(struct string_list *contents,
+				      struct strbuf *dirpath)
+{
+	struct string_list current_contents = STRING_LIST_INIT_DUP;
+	struct string_list current_subdirs = STRING_LIST_INIT_DUP;
+	struct string_list_item *it;
+	struct strbuf buf = STRBUF_INIT;
+
+	list_contents_of_dir(&current_contents, dirpath, 0, 0);
+	for_each_string_list_item(it, &current_contents) {
+		strbuf_reset(&buf);
+		strbuf_addbuf(&buf, dirpath);
+		strbuf_complete(&buf, '/');
+		strbuf_addstr(&buf, it->string);
+
+		string_list_append(contents, buf.buf);
+	}
+
+	list_contents_of_dir(&current_subdirs, dirpath, 1, DT_DIR);
+	for_each_string_list_item(it, &current_subdirs) {
+		if (strcmp(it->string, ".") == 0
+		    || strcmp(it->string, "..") == 0)
+			continue;
+		strbuf_reset(&buf);
+		strbuf_addbuf(&buf, dirpath);
+		strbuf_complete(&buf, '/');
+		strbuf_addstr(&buf, it->string);
+
+		list_contents_of_dir_recursively(contents, &buf);
+	}
+}
 
 void get_object_counts(struct strbuf *obj_info)
 {
@@ -240,3 +276,25 @@ void get_packed_object_summary(struct strbuf *obj_info)
 		strbuf_complete_line(obj_info);
 	}
 }
+
+void get_object_info_summary(struct strbuf *obj_info)
+{
+	// strbuf += GITDIR/info/:
+	// recursively list contents of $GIT_OBJECT_DIRECTORY/info
+	struct strbuf dirpath = STRBUF_INIT;
+	struct string_list contents = STRING_LIST_INIT_DUP;
+	struct string_list_item *entry;
+
+	strbuf_reset(obj_info);
+
+	strbuf_addstr(&dirpath, get_object_directory());
+	strbuf_complete(&dirpath, '/');
+	strbuf_addstr(&dirpath, "info/");
+
+	list_contents_of_dir_recursively(&contents, &dirpath);
+
+	for_each_string_list_item(entry, &contents) {
+		strbuf_addstr(obj_info, entry->string);
+		strbuf_complete_line(obj_info);
+	}
+}
diff --git a/bugreport.h b/bugreport.h
index 11ff7df41b..4f5e2d1b9a 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -30,3 +30,9 @@ void get_loose_object_summary(struct strbuf *obj_info);
  * hook_info will be discarded.
  */
 void get_packed_object_summary(struct strbuf *obj_info);
+
+/**
+ * Adds a list of all contents (recursively) of '.git/objects/info'. The
+ * previous contents of hook_info will be discarded.
+ */
+void get_object_info_summary(struct strbuf *obj_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index da91a3944e..8aad33a9b0 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -72,6 +72,13 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_packed_object_summary(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Object Info Data");
+	get_object_info_summary(&buffer);
+	strbuf_write(&buffer, report);
+
+	// Close file
+	// open file in editor
+	launch_editor(report_path, NULL, NULL);
 	fclose(report);
 
 	launch_editor(report_path.buf, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 9/9] bugreport: print contents of alternates file
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (7 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info Emily Shaffer
@ 2019-10-25  2:51     ` Emily Shaffer
  2019-10-28 15:57       ` Johannes Schindelin
  2019-10-29  1:54     ` [PATCH v3 0/9] add git-bugreport tool Junio C Hamano
  9 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-10-25  2:51 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

In some cases, it could be that the user is having a problem with an
object which isn't present in their normal object directory. We can get
a hint that that might be the case by examining the list of alternates
where their object may be stored instead.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 bugreport.c         | 14 ++++++++++++++
 bugreport.h         |  6 ++++++
 builtin/bugreport.c |  4 ++++
 3 files changed, 24 insertions(+)

diff --git a/bugreport.c b/bugreport.c
index ce15904fec..a7bdc72b7f 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -298,3 +298,17 @@ void get_object_info_summary(struct strbuf *obj_info)
 		strbuf_complete_line(obj_info);
 	}
 }
+
+void get_alternates_file(struct strbuf *alternates_info)
+{
+	struct strbuf alternates_path = STRBUF_INIT;
+
+	strbuf_addstr(&alternates_path, get_object_directory());
+	strbuf_complete(&alternates_path, '/');
+	strbuf_addstr(&alternates_path, "info/alternates");
+
+	strbuf_reset(alternates_info);
+	strbuf_addbuf(alternates_info, &alternates_path);
+	strbuf_complete_line(alternates_info);
+	strbuf_read_file(alternates_info, alternates_path.buf, 0);
+}
diff --git a/bugreport.h b/bugreport.h
index 4f5e2d1b9a..74d1f79960 100644
--- a/bugreport.h
+++ b/bugreport.h
@@ -36,3 +36,9 @@ void get_packed_object_summary(struct strbuf *obj_info);
  * previous contents of hook_info will be discarded.
  */
 void get_object_info_summary(struct strbuf *obj_info);
+
+/**
+ * Adds the contents of '.git/info/alternates'. The previous contents of
+ * alternates_info will be discarded.
+ */
+void get_alternates_file(struct strbuf *alt_info);
diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index 8aad33a9b0..0784bdc42a 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -76,6 +76,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_object_info_summary(&buffer);
 	strbuf_write(&buffer, report);
 
+	add_header(report, "Alternates File");
+	get_alternates_file(&buffer);
+	strbuf_write(&buffer, report);
+
 	// Close file
 	// open file in editor
 	launch_editor(report_path, NULL, NULL);
-- 
2.24.0.rc0.303.g954a862665-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 2/9] bugreport: generate config whitelist based on docs
  2019-10-25  2:51     ` [PATCH v3 2/9] bugreport: generate config whitelist based on docs Emily Shaffer
@ 2019-10-28 13:27       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 13:27 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> diff --git a/bugreport-generate-config-whitelist.sh b/bugreport-generate-config-whitelist.sh
> new file mode 100755
> index 0000000000..ca6b232024
> --- /dev/null
> +++ b/bugreport-generate-config-whitelist.sh
> @@ -0,0 +1,4 @@
> +#!/bin/sh
> +
> +grep -RhPo ".*(?=:: \/\/ bugreport-include)" Documentation/config \

I am rather certain that `-P` is not supported by BSD grep (see
https://man.openbsd.org/grep).

Why not something portable, e.g.

	find Documentation/config -type f -exec cat {} \; |
	sed -n 's/^\(.*\):: \/\/ bugreport-include$/\1/p' \

?

Ciao,
Dscho

> +  >git-bugreport-config-whitelist
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 3/9] bugreport: add version and system information
  2019-10-25  2:51     ` [PATCH v3 3/9] bugreport: add version and system information Emily Shaffer
@ 2019-10-28 13:49       ` Johannes Schindelin
  2019-11-08 21:48         ` Emily Shaffer
  2019-10-29 20:43       ` Josh Steadmon
  1 sibling, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 13:49 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> diff --git a/bugreport.c b/bugreport.c
> new file mode 100644
> index 0000000000..ada54fe583
> --- /dev/null
> +++ b/bugreport.c
> @@ -0,0 +1,55 @@
> +#include "cache.h"
> +
> +#include "bugreport.h"
> +#include "help.h"
> +#include "run-command.h"
> +#include "version.h"
> +
> +void get_system_info(struct strbuf *sys_info)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf std_out = STRBUF_INIT;
> +
> +	strbuf_reset(sys_info);
> +
> +	// get git version from native cmd

Please use C-style comments instead of C++ ones.

> +	strbuf_addstr(sys_info, "git version: ");
> +	strbuf_addstr(sys_info, git_version_string);
> +	strbuf_complete_line(sys_info);
> +
> +	// system call for other version info
> +	argv_array_push(&cp.args, "uname");
> +	argv_array_push(&cp.args, "-a");
> +	capture_command(&cp, &std_out, 0);

Mmmkay. I am not too much of a fan of relying on the `uname` executable,
as it can very well be a 32-bit `uname.exe` on Windows, which obviously
would _not_ report the architecture of the machine, but something
misleading.

Why not use the `uname()` function (that we even define in
`compat/mingw.c`)?

Also, why not include the same information as `git version
--build-options`, most importantly `GIT_HOST_CPU`?

In any case, if you are capturing the output of `uname -a`, you
definitely will want to pass the `RUN_COMMAND_STDOUT_TO_STDERR` flag (in
case, say, the MSYS2 `uname.exe` fails with those dreaded Cygwin error
messages like "*** fatal error - add_item
("\??\D:\git\installation\Git", "/", ...) failed, errno 1").

> +
> +	strbuf_addstr(sys_info, "uname -a: ");
> +	strbuf_addbuf(sys_info, &std_out);
> +	strbuf_complete_line(sys_info);
> +
> +	argv_array_clear(&cp.args);
> +	strbuf_reset(&std_out);
> +
> +
> +	argv_array_push(&cp.args, "curl-config");
> +	argv_array_push(&cp.args, "--version");
> +	capture_command(&cp, &std_out, 0);

This will be almost certainly be incorrect, as most _regular_ users
won't have `curl-config`. I know, it is easy to forget that most Git
users are no hard-core C developers ;-)

A better alternative would be to use `curl_version()`, guarded, of
course, by `#ifndef NO_CURL`...

> +
> +	strbuf_addstr(sys_info, "curl-config --version: ");
> +	strbuf_addbuf(sys_info, &std_out);
> +	strbuf_complete_line(sys_info);
> +
> +	argv_array_clear(&cp.args);
> +	strbuf_reset(&std_out);
> +
> +
> +	argv_array_push(&cp.args, "ldd");
> +	argv_array_push(&cp.args, "--version");
> +	capture_command(&cp, &std_out, 0);

Again, this command will only be present in few setups. I am not
actually sure that the output of this is interesting to begin with.

What I _do_ think is that a much more interesting piece of information
would be the exact GLIBC version, the OS name and the OS version.

> +
> +	strbuf_addstr(sys_info, "ldd --version: ");
> +	strbuf_addbuf(sys_info, &std_out);
> +	strbuf_complete_line(sys_info);
> +
> +	argv_array_clear(&cp.args);
> +	strbuf_reset(&std_out);
> +}
> diff --git a/bugreport.h b/bugreport.h
> new file mode 100644
> index 0000000000..ba216acf3f
> --- /dev/null
> +++ b/bugreport.h
> @@ -0,0 +1,7 @@
> +#include "strbuf.h"
> +
> +/**
> + * Adds the Git version, `uname -a`, and `curl-config --version` to sys_info.
> + * The previous contents of sys_info will be discarded.
> + */
> +void get_system_info(struct strbuf *sys_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 2ef16440a0..7232d31be7 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -1,4 +1,5 @@
>  #include "builtin.h"
> +#include "bugreport.h"
>  #include "stdio.h"
>  #include "strbuf.h"
>  #include "time.h"
> @@ -27,6 +28,13 @@ int get_bug_template(struct strbuf *template)
>  	return 0;
>  }
>
> +void add_header(FILE *report, const char *title)
> +{
> +	struct strbuf buffer = STRBUF_INIT;
> +	strbuf_addf(&buffer, "\n\n[%s]\n", title);
> +	strbuf_write(&buffer, report);

This leaks `buffer`. Why not write into `report` via `fprintf()`
directly?

Ciao,
Dscho

> +}
> +
>  int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  {
>  	struct strbuf buffer = STRBUF_INIT;
> @@ -43,6 +51,11 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_bug_template(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	// add other contents
> +	add_header(report, "System Info");
> +	get_system_info(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-10-25  2:51     ` [PATCH v3 4/9] bugreport: add config values from whitelist Emily Shaffer
@ 2019-10-28 14:14       ` Johannes Schindelin
  2019-12-11 20:48         ` Emily Shaffer
  2019-10-29 20:58       ` Josh Steadmon
  1 sibling, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 14:14 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> Teach bugreport to gather the values of config options which are present
> in 'git-bugreport-config-whitelist'.
>
> Many config options are sensitive, and many Git add-ons use config
> options which git-core does not know about; it is better only to gather
> config options which we know to be safe, rather than excluding options
> which we know to be unsafe.

Should we still have the `// bugreport-exclude` comments, then?

>
> Reading the whitelist into memory and sorting it saves us time -
> since git_config_bugreport() is called for every option the user has
> configured, limiting the file IO to one open/read/close and performing
> option lookup in sublinear time is a useful optimization.

Maybe we even want a hashmap? That would reduce the time complexity even
further.

> diff --git a/bugreport.c b/bugreport.c
> index ada54fe583..afa4836ab1 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -1,10 +1,24 @@
>  #include "cache.h"
>
>  #include "bugreport.h"
> +#include "config.h"
> +#include "exec-cmd.h"
>  #include "help.h"
>  #include "run-command.h"
>  #include "version.h"
>
> +/**
> + * A sorted list of config options which we will add to the bugreport. Managed
> + * by 'gather_whitelist(...)'.
> + */
> +struct string_list whitelist = STRING_LIST_INIT_DUP;
> +struct strbuf configs_and_values = STRBUF_INIT;
> +
> +// git version --build-options
> +// uname -a
> +// curl-config --version
> +// ldd --version
> +// echo $SHELL

These comments probably want to move to a single, C style comment, and
they probably want to be introduced together with `get_system_info()`.

I also have to admit that I might have missed where `$SHELL` was added
to the output...

>  void get_system_info(struct strbuf *sys_info)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
>  	argv_array_clear(&cp.args);
>  	strbuf_reset(&std_out);
>  }
> +
> +void gather_whitelist(struct strbuf *path)
> +{
> +	struct strbuf tmp = STRBUF_INIT;
> +	strbuf_read_file(&tmp, path->buf, 0);
> +	string_list_init(&whitelist, 1);
> +	string_list_split(&whitelist, tmp.buf, '\n', -1);
> +	string_list_sort(&whitelist);
> +}
> +
> +int git_config_bugreport(const char *var, const char *value, void *cb)
> +{
> +	if (string_list_has_string(&whitelist, var)) {
> +		strbuf_addf(&configs_and_values,
> +			    "%s : %s\n",
> +			    var, value);

A quite useful piece of information would be the config source. Not sure
whether we can do that outside of `config.c` yet...

> +	}
> +
> +	return 0;
> +}
> +
> +void get_whitelisted_config(struct strbuf *config_info)
> +{
> +	struct strbuf path = STRBUF_INIT;
> +
> +	strbuf_addstr(&path, git_exec_path());
> +	strbuf_addstr(&path, "/git-bugreport-config-whitelist");

Hmm. I would have expected this patch to come directly after the patch
2/9 that generates that white-list, and I would also have expected that
to be pre-sorted, and compiled in.

Do you want users to _edit_ the file in the exec path? In general, that
path will be write-protected, though. A better alternative would
probably be to compile in a hard-coded list, and to allow including more
values e.g. by offering command-line options to specify config setting
patterns. But if we allow patterns, we might actually want to have those
exclusions to prevent sensitive data from being included.

> +
> +	gather_whitelist(&path);
> +	strbuf_init(&configs_and_values, whitelist.nr);
> +
> +	git_config(git_config_bugreport, NULL);
> +
> +	strbuf_reset(config_info);
> +	strbuf_addbuf(config_info, &configs_and_values);
> +}
> diff --git a/bugreport.h b/bugreport.h
> index ba216acf3f..7413e7e1be 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -5,3 +5,10 @@
>   * The previous contents of sys_info will be discarded.
>   */
>  void get_system_info(struct strbuf *sys_info);
> +
> +/**

I also frequently use JavaDoc-style `/**`, but I am not sure that this
is actually desired in Git's source code ;-)

> + * Adds the values of the config items listed in
> + * 'git-bugreport-config-whitelist' to config_info. The previous contents of
> + * config_info will be discarded.
> + */
> +void get_whitelisted_config(struct strbuf *sys_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 7232d31be7..70fe0d2b85 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_system_info(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Whitelisted Config");

Quite honestly, I would like to avoid the term "whitelist" for good. How
about "Selected config settings" instead?

Thanks,
Dscho

> +	get_whitelisted_config(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 5/9] bugreport: collect list of populated hooks
  2019-10-25  2:51     ` [PATCH v3 5/9] bugreport: collect list of populated hooks Emily Shaffer
@ 2019-10-28 14:31       ` Johannes Schindelin
  2019-12-11 20:51         ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 14:31 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> Occasionally a failure a user is seeing may be related to a specific
> hook which is being run, perhaps without the user realizing. While the
> contents of hooks can be sensitive - containing user data or process
> information specific to the user's organization - simply knowing that a
> hook is being run at a certain stage can help us to understand whether
> something is going wrong.
>
> Without a definitive list of hook names within the code, we compile our
> own list from the documentation. This is likely prone to bitrot. To
> reduce the amount of code humans need to read, we turn the list into a
> string_list and iterate over it (as we are calling the same find_hook
> operation on each string). However, since bugreport should primarily be
> called by the user, the performance loss from massaging the string
> seems acceptable.

Good idea!

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  bugreport.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  bugreport.h         |  6 ++++++
>  builtin/bugreport.c |  4 ++++
>  3 files changed, 54 insertions(+)
>
> diff --git a/bugreport.c b/bugreport.c
> index afa4836ab1..9d7f44ff28 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -103,3 +103,47 @@ void get_whitelisted_config(struct strbuf *config_info)
>  	strbuf_reset(config_info);
>  	strbuf_addbuf(config_info, &configs_and_values);
>  }
> +
> +void get_populated_hooks(struct strbuf *hook_info)
> +{
> +	/*
> +	 * Doesn't look like there is a list of all possible hooks; so below is
> +	 * a transcription of `git help hook`.
> +	 */
> +	const char *hooks = "applypatch-msg,"
> +			    "pre-applypatch,"
> +			    "post-applypatch,"
> +			    "pre-commit,"
> +			    "pre-merge-commit,"
> +			    "prepare-commit-msg,"
> +			    "commit-msg,"
> +			    "post-commit,"
> +			    "pre-rebase,"
> +			    "post-checkout,"
> +			    "post-merge,"
> +			    "pre-push,"
> +			    "pre-receive,"
> +			    "update,"
> +			    "post-receive,"
> +			    "post-update,"
> +			    "push-to-checkout,"
> +			    "pre-auto-gc,"
> +			    "post-rewrite,"
> +			    "sendemail-validate,"
> +			    "fsmonitor-watchman,"
> +			    "p4-pre-submit,"
> +			    "post-index-changex";

Well, this is disappointing: I tried to come up with a scripted way to
extract the commit names from our source code, and I failed! This is
where I gave up:

	git grep run_hook |
	sed -n 's/.*run_hook[^(]*([^,]*, *\([^,]*\).*/\1/p' |
	sed -e '/^name$/d' -e '/^const char \*name$/d' \
		-e 's/push_to_checkout_hook/"push-to-checkout"/' |
	sort |
	uniq

This prints only the following hook names:

	"applypatch-msg"
	"post-applypatch"
	"post-checkout"
	"post-index-change"
	"post-merge"
	"pre-applypatch"
	"pre-auto-gc"
	"pre-rebase"
	"prepare-commit-msg"
	"push-to-checkout"

But at least it was easy to script the extracting from the
Documentation:

	sed -n '/^[a-z]/{N;/\n~~~/{s/\n.*//;p}}' \
		Documentation/githooks.txt

> +	struct string_list hooks_list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *iter = NULL;
> +
> +	strbuf_reset(hook_info);
> +
> +	string_list_split(&hooks_list, hooks, ',', -1);
> +
> +	for_each_string_list_item(iter, &hooks_list) {

This should definitely be done at compile time, I think. We should be
able to generate an appropriate header via something like this:

	cat >hook-names.h <<-EOF
	static const char *hook_names[] = {
	$(sed -n '/^[a-z]/{N;/\n~~~/{s/\n.*/",/;s/^/	"/;p}}' \
		Documentation/githooks.txt)
	};
	EOF

Then you would use a simple `for()` loop using `ARRAY_SIZE()` to
iterate over the hook names.

> +		if (find_hook(iter->string)) {
> +			strbuf_addstr(hook_info, iter->string);
> +			strbuf_complete_line(hook_info);
> +		}
> +	}
> +}
> diff --git a/bugreport.h b/bugreport.h
> index 7413e7e1be..942a5436e3 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -12,3 +12,9 @@ void get_system_info(struct strbuf *sys_info);
>   * config_info will be discarded.
>   */
>  void get_whitelisted_config(struct strbuf *sys_info);
> +
> +/**
> + * Adds the paths to all configured hooks (but not their contents). The previous
> + * contents of hook_info will be discarded.
> + */
> +void get_populated_hooks(struct strbuf *hook_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 70fe0d2b85..a0eefba498 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -60,6 +60,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_whitelisted_config(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Populated Hooks");

Wait! I should have stumbled over this in an earlier patch. The
`add_header()` function should not take a `FILE *` parameter at all, but
instead an `struct strbuf *` one!

Ciao,
Dscho

> +	get_populated_hooks(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 6/9] bugreport: count loose objects
  2019-10-25  2:51     ` [PATCH v3 6/9] bugreport: count loose objects Emily Shaffer
@ 2019-10-28 15:07       ` Johannes Schindelin
  2019-12-10 22:34         ` Emily Shaffer
  2019-10-29 21:18       ` Josh Steadmon
  1 sibling, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 15:07 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> The number of unpacked objects in a user's repository may help us
> understand the root of the problem they're seeing, especially if a
> command is running unusually slowly.
>
> Rather than directly invoking 'git-count-objects', which may sometimes
> fail unexpectedly on Git for Windows, manually count the contents of
> .git/objects. Additionally, since we may wish to inspect other
> directories' contents for bugreport in the future, put the directory
> listing into a helper function.

Thank you, much appreciated!

I guess the next step is to count the number of packs, and the number of
submodules ;-)

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  bugreport.c         | 72 +++++++++++++++++++++++++++++++++++++++++++++
>  bugreport.h         |  6 ++++
>  builtin/bugreport.c |  4 +++
>  3 files changed, 82 insertions(+)
>
> diff --git a/bugreport.c b/bugreport.c
> index 9d7f44ff28..54e1d47103 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -5,8 +5,11 @@
>  #include "exec-cmd.h"
>  #include "help.h"
>  #include "run-command.h"
> +#include "strbuf.h"

Why not append this to the end of the `#include` list, as is common in
Git's commit history?

>  #include "version.h"
>
> +#include "dirent.h"

This header (although with pointy brackets instead of double quotes) is
already included in `git-compat-util.h`

> +
>  /**
>   * A sorted list of config options which we will add to the bugreport. Managed
>   * by 'gather_whitelist(...)'.
> @@ -147,3 +150,72 @@ void get_populated_hooks(struct strbuf *hook_info)
>  		}
>  	}
>  }
> +
> +/**
> + * Fill 'contents' with the contents of the dir at 'dirpath'.

Since you start this comment in JavaDoc style, there should be an almost
empty line after this one ("almost" because it still contains the
asterisk, of course).

> + * If 'filter' is nonzero, the contents are filtered on d_type as 'type' - see
> + * 'man readdir'. opendir() doesn't take string length as an arg, so don't
> + * bother passing it in.
> + */
> +void list_contents_of_dir(struct string_list *contents, struct strbuf *dirpath,

Shouldn't this be `static`?

> +			  int filter, unsigned char type)
> +{
> +	struct dirent *dir = NULL;
> +	DIR *dh = NULL;
> +
> +	dh = opendir(dirpath->buf);
> +	while (dh && (dir = readdir(dh))) {
> +		if (!filter || type == dir->d_type) {
> +			string_list_append(contents, dir->d_name);
> +		}
> +	}
> +}
> +
> +
> +void get_object_counts(struct strbuf *obj_info)

Oops. This function is no longer used.

> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf std_out = STRBUF_INIT;
> +
> +	argv_array_push(&cp.args, "count-objects");
> +	argv_array_push(&cp.args, "-vH");
> +	cp.git_cmd = 1;
> +	capture_command(&cp, &std_out, 0);
> +
> +	strbuf_reset(obj_info);
> +	strbuf_addstr(obj_info, "git-count-objects -vH:\n");
> +	strbuf_addbuf(obj_info, &std_out);
> +}
> +
> +void get_loose_object_summary(struct strbuf *obj_info)
> +{
> +	struct strbuf dirpath = STRBUF_INIT;
> +	struct string_list subdirs = STRING_LIST_INIT_DUP;
> +	struct string_list_item *subdir;
> +
> +	strbuf_reset(obj_info);
> +
> +	strbuf_addstr(&dirpath, get_object_directory());
> +	strbuf_complete(&dirpath, '/');
> +
> +	list_contents_of_dir(&subdirs, &dirpath, 1, DT_DIR);
> +
> +	for_each_string_list_item(subdir, &subdirs)
> +	{
> +		struct strbuf subdir_buf = STRBUF_INIT;
> +		struct string_list objects = STRING_LIST_INIT_DUP;
> +
> +		/*
> +		 * Only interested in loose objects - so dirs named with the
> +		 * first byte of the object ID
> +		 */
> +		if (strlen(subdir->string) != 2 || !strcmp(subdir->string, ".."))
> +			continue;
> +
> +		strbuf_addbuf(&subdir_buf, &dirpath);
> +		strbuf_addstr(&subdir_buf, subdir->string);
> +		list_contents_of_dir(&objects, &subdir_buf, 0, 0);
> +		strbuf_addf(obj_info, "%s: %d objects\n", subdir->string,
> +			    objects.nr);

Hmm. Not only does this leak `objects`, it also throws away the contents
that we so painfully constructed.

Wouldn't it make more sense to do something like this instead?

static int is_hex(const char *string, size_t count)
{
	for (; count; string++, count--)
		if (hexval(*string) < 0)
			return 0;
	return 1;
}

static ssize_t count_loose_objects(struct strbuf *objects_path)
{
	ssize_t ret = 0;
	size_t len;
	struct dirent *d;
	DIR *dir, *subdir;

	dir = opendir(objects_path->buf);
	if (!dir)
		return -1;

	strbuf_complete(objects_path, '/');
	len = objects_path->len;
	while ((d = readdir(dir))) {
		if (d->d_type != DT_DIR)
			continue;
		strbuf_setlen(objects_path, len);
		strbuf_addstr(objects_path, d->d_name);
		subdir = opendir(objects_path->buf);
		if (!subdir)
			continue;
		while ((d = readdir(subdir)))
			if (d->dt_type == DT_REG &&
			    is_hex(dir->d_name, the_repository->hash_algo->hexsz))
				ret++;
		closedir(subdir);
	}
	closedir(dir);
	strbuf_reset(objects_path, len);
	return ret;
}

Ciao,
Dscho

> +	}
> +}
> diff --git a/bugreport.h b/bugreport.h
> index 942a5436e3..09ad0c2599 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -18,3 +18,9 @@ void get_whitelisted_config(struct strbuf *sys_info);
>   * contents of hook_info will be discarded.
>   */
>  void get_populated_hooks(struct strbuf *hook_info);
> +
> +/**
> + * Adds the output of `git count-object -vH`. The previous contents of hook_info
> + * will be discarded.
> + */
> +void get_loose_object_summary(struct strbuf *obj_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index a0eefba498..b2ab194207 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -64,6 +64,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_populated_hooks(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Object Counts");
> +	get_loose_object_summary(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 7/9] bugreport: add packed object summary
  2019-10-25  2:51     ` [PATCH v3 7/9] bugreport: add packed object summary Emily Shaffer
@ 2019-10-28 15:43       ` Johannes Schindelin
  2019-12-11  0:29         ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 15:43 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> Alongside the list of loose objects, it's useful to see the list of
> object packs as well. It can help us to examine what Git did and did not
> pack.

Yes, I was write! Packs are next ;-)

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  bugreport.c         | 21 +++++++++++++++++++++
>  bugreport.h         |  6 ++++++
>  builtin/bugreport.c |  4 ++++
>  3 files changed, 31 insertions(+)
>
> diff --git a/bugreport.c b/bugreport.c
> index 54e1d47103..79ddb8baaa 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -219,3 +219,24 @@ void get_loose_object_summary(struct strbuf *obj_info)
>  			    objects.nr);
>  	}
>  }
> +
> +void get_packed_object_summary(struct strbuf *obj_info)
> +{
> +	struct strbuf dirpath = STRBUF_INIT;
> +	struct string_list contents = STRING_LIST_INIT_DUP;
> +	struct string_list_item *entry;
> +
> +	strbuf_reset(obj_info);
> +
> +	strbuf_addstr(&dirpath, get_object_directory());
> +	strbuf_complete(&dirpath, '/');
> +	strbuf_addstr(&dirpath, "pack/");
> +	list_contents_of_dir(&contents, &dirpath, 0, 0);
> +
> +	// list full contents of $GIT_OBJECT_DIRECTORY/pack/
> +	for_each_string_list_item(entry, &contents) {
> +		strbuf_addbuf(obj_info, &dirpath);
> +		strbuf_addstr(obj_info, entry->string);
> +		strbuf_complete_line(obj_info);
> +	}
> +}

Okay, but I think that you will want to discern between regular `.pack`
files, `.idx` files and `tmp_*` files.

> diff --git a/bugreport.h b/bugreport.h
> index 09ad0c2599..11ff7df41b 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -24,3 +24,9 @@ void get_populated_hooks(struct strbuf *hook_info);
>   * will be discarded.
>   */
>  void get_loose_object_summary(struct strbuf *obj_info);
> +
> +/**
> + * Adds a list of the contents of '.git/objects/pack'. The previous contents of
> + * hook_info will be discarded.
> + */
> +void get_packed_object_summary(struct strbuf *obj_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index b2ab194207..da91a3944e 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -68,6 +68,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_loose_object_summary(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Packed Object Summary");
> +	get_packed_object_summary(&buffer);
> +	strbuf_write(&buffer, report);
> +

Hmm. At this point, I am unclear whether you want to write into an
`strbuf`, or directly into a `FILE *`? I would rather have only one, not
a mix.

Ciao,
Dscho

>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info
  2019-10-25  2:51     ` [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info Emily Shaffer
@ 2019-10-28 15:51       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 15:51 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> Miscellaneous information used about the object store can end up in
> .git/objects/info; this can help us understand what may be going on with
> the object store when the user is reporting a bug. Otherwise, it could
> be difficult to track down what is going wrong with an object which
> isn't kept locally to .git/objects/ or .git/objects/pack. Having some
> understanding of where the user's objects may be kept can save us some
> hops during the bug reporting process.

Makes sense.

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  bugreport.c         | 58 +++++++++++++++++++++++++++++++++++++++++++++
>  bugreport.h         |  6 +++++
>  builtin/bugreport.c |  7 ++++++
>  3 files changed, 71 insertions(+)
>
> diff --git a/bugreport.c b/bugreport.c
> index 79ddb8baaa..ce15904fec 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -6,6 +6,7 @@
>  #include "help.h"
>  #include "run-command.h"
>  #include "strbuf.h"
> +#include "string-list.h"
>  #include "version.h"
>
>  #include "dirent.h"
> @@ -171,6 +172,41 @@ void list_contents_of_dir(struct string_list *contents, struct strbuf *dirpath,
>  	}
>  }
>
> +/**
> + * Fills 'contents' with a list of all directories within the provided
> + * directory, recursing into each directory.
> + */
> +void list_contents_of_dir_recursively(struct string_list *contents,
> +				      struct strbuf *dirpath)
> +{
> +	struct string_list current_contents = STRING_LIST_INIT_DUP;
> +	struct string_list current_subdirs = STRING_LIST_INIT_DUP;

Rather than populating those string lists, why not write into the
`contents` directly, _while_ traversing the directories?

But if you really, really want to use string lists (hey, I'm a fan, I
introduced them to Git's source code, after all), then please make sure
to release them in the end, too.

> +	struct string_list_item *it;
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	list_contents_of_dir(&current_contents, dirpath, 0, 0);
> +	for_each_string_list_item(it, &current_contents) {
> +		strbuf_reset(&buf);
> +		strbuf_addbuf(&buf, dirpath);
> +		strbuf_complete(&buf, '/');
> +		strbuf_addstr(&buf, it->string);

You don't need a separate `strbuf` for that: just complete `dirpath`
with a trailing slash, remember the length, then in each iteration reset
the length and append the current subdirectory name.

> +
> +		string_list_append(contents, buf.buf);
> +	}
> +
> +	list_contents_of_dir(&current_subdirs, dirpath, 1, DT_DIR);
> +	for_each_string_list_item(it, &current_subdirs) {
> +		if (strcmp(it->string, ".") == 0
> +		    || strcmp(it->string, "..") == 0)
> +			continue;
> +		strbuf_reset(&buf);
> +		strbuf_addbuf(&buf, dirpath);
> +		strbuf_complete(&buf, '/');
> +		strbuf_addstr(&buf, it->string);
> +
> +		list_contents_of_dir_recursively(contents, &buf);
> +	}
> +}
>
>  void get_object_counts(struct strbuf *obj_info)
>  {
> @@ -240,3 +276,25 @@ void get_packed_object_summary(struct strbuf *obj_info)
>  		strbuf_complete_line(obj_info);
>  	}
>  }
> +
> +void get_object_info_summary(struct strbuf *obj_info)
> +{
> +	// strbuf += GITDIR/info/:
> +	// recursively list contents of $GIT_OBJECT_DIRECTORY/info
> +	struct strbuf dirpath = STRBUF_INIT;
> +	struct string_list contents = STRING_LIST_INIT_DUP;
> +	struct string_list_item *entry;
> +
> +	strbuf_reset(obj_info);
> +
> +	strbuf_addstr(&dirpath, get_object_directory());
> +	strbuf_complete(&dirpath, '/');
> +	strbuf_addstr(&dirpath, "info/");
> +
> +	list_contents_of_dir_recursively(&contents, &dirpath);
> +
> +	for_each_string_list_item(entry, &contents) {
> +		strbuf_addstr(obj_info, entry->string);
> +		strbuf_complete_line(obj_info);
> +	}

As mentioned earlier, I would rather write things into `obj_info`
directly, without detouring to a string list.

Or, if you decide that you want to write to a `FILE *` anyway, I'd
rather avoid the detour to the `strbuf` altogether.

Ciao,
Dscho

> +}
> diff --git a/bugreport.h b/bugreport.h
> index 11ff7df41b..4f5e2d1b9a 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -30,3 +30,9 @@ void get_loose_object_summary(struct strbuf *obj_info);
>   * hook_info will be discarded.
>   */
>  void get_packed_object_summary(struct strbuf *obj_info);
> +
> +/**
> + * Adds a list of all contents (recursively) of '.git/objects/info'. The
> + * previous contents of hook_info will be discarded.
> + */
> +void get_object_info_summary(struct strbuf *obj_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index da91a3944e..8aad33a9b0 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -72,6 +72,13 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_packed_object_summary(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Object Info Data");
> +	get_object_info_summary(&buffer);
> +	strbuf_write(&buffer, report);
> +
> +	// Close file
> +	// open file in editor
> +	launch_editor(report_path, NULL, NULL);
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 9/9] bugreport: print contents of alternates file
  2019-10-25  2:51     ` [PATCH v3 9/9] bugreport: print contents of alternates file Emily Shaffer
@ 2019-10-28 15:57       ` Johannes Schindelin
  2019-11-19 20:40         ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-28 15:57 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> In some cases, it could be that the user is having a problem with an
> object which isn't present in their normal object directory. We can get
> a hint that that might be the case by examining the list of alternates
> where their object may be stored instead.

Doesn't this open the possibility of leaking project's (possibly NDA'ed) names?

I could imagine that we might rather want to count the alternates, and
maybe separate into those alternates that actually exist and alternates
that do not exist (which would produce a warning that the user might
have trained themselves to ignore).

Ciao,
Dscho

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  bugreport.c         | 14 ++++++++++++++
>  bugreport.h         |  6 ++++++
>  builtin/bugreport.c |  4 ++++
>  3 files changed, 24 insertions(+)
>
> diff --git a/bugreport.c b/bugreport.c
> index ce15904fec..a7bdc72b7f 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -298,3 +298,17 @@ void get_object_info_summary(struct strbuf *obj_info)
>  		strbuf_complete_line(obj_info);
>  	}
>  }
> +
> +void get_alternates_file(struct strbuf *alternates_info)
> +{
> +	struct strbuf alternates_path = STRBUF_INIT;
> +
> +	strbuf_addstr(&alternates_path, get_object_directory());
> +	strbuf_complete(&alternates_path, '/');
> +	strbuf_addstr(&alternates_path, "info/alternates");
> +
> +	strbuf_reset(alternates_info);
> +	strbuf_addbuf(alternates_info, &alternates_path);
> +	strbuf_complete_line(alternates_info);
> +	strbuf_read_file(alternates_info, alternates_path.buf, 0);
> +}
> diff --git a/bugreport.h b/bugreport.h
> index 4f5e2d1b9a..74d1f79960 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -36,3 +36,9 @@ void get_packed_object_summary(struct strbuf *obj_info);
>   * previous contents of hook_info will be discarded.
>   */
>  void get_object_info_summary(struct strbuf *obj_info);
> +
> +/**
> + * Adds the contents of '.git/info/alternates'. The previous contents of
> + * alternates_info will be discarded.
> + */
> +void get_alternates_file(struct strbuf *alt_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 8aad33a9b0..0784bdc42a 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -76,6 +76,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_object_info_summary(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Alternates File");
> +	get_alternates_file(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	// Close file
>  	// open file in editor
>  	launch_editor(report_path, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/9] add git-bugreport tool
  2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
                       ` (8 preceding siblings ...)
  2019-10-25  2:51     ` [PATCH v3 9/9] bugreport: print contents of alternates file Emily Shaffer
@ 2019-10-29  1:54     ` Junio C Hamano
  2019-10-29 11:13       ` Johannes Schindelin
  9 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-10-29  1:54 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Derrick Stolee, Johannes Schindelin

Emily Shaffer <emilyshaffer@google.com> writes:

> Thanks for the patience with the long wait, all. Here's an attempt at
> the rewrite in C; I think it does verbatim what the sh version did
> except that this doesn't print the reflog - Jonathan Nieder was good
> enough to point out to me that folks probably don't want to share their
> commit subjects all willy-nilly if they're working on something
> secretive.

Is the goal to give a tool the end users can type "git
bugreport<RET>" and automatically send the result to this mailing
list (or some bug tracker)?  Or is this only about producing a
pre-filled bug report template to be slurped into their MUA and then
further manually edited before sending it out?

It probably is controversial if we exposed contents of hooks scripts
(I can imagine some people may trigger external process by pinging a
URL that includes credential material out of laziness), so the
presence test you have is probably a good stopping point.  I do not
know how much it helps to know which hooks exist in order to
diagnose an anomaly without knowing what their contents are, but it
is a start.

By the way, I doubt that it is the best use of developer time to
write these in C---taking various pieces of information from
different places to prepare a text file sounds more suited to
scripting languages, especially while we are still learning what
kind of information we want to collect and how we want to present
the final result (iow, for the first handful of years after
deploying this command).

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/9] add git-bugreport tool
  2019-10-29  1:54     ` [PATCH v3 0/9] add git-bugreport tool Junio C Hamano
@ 2019-10-29 11:13       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-10-29 11:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emily Shaffer, git, Derrick Stolee

Hi Junio,

On Tue, 29 Oct 2019, Junio C Hamano wrote:

> Emily Shaffer <emilyshaffer@google.com> writes:
>
> > Thanks for the patience with the long wait, all. Here's an attempt at
> > the rewrite in C; I think it does verbatim what the sh version did
> > except that this doesn't print the reflog - Jonathan Nieder was good
> > enough to point out to me that folks probably don't want to share their
> > commit subjects all willy-nilly if they're working on something
> > secretive.
>
> Is the goal to give a tool the end users can type "git
> bugreport<RET>" and automatically send the result to this mailing
> list (or some bug tracker)?  Or is this only about producing a
> pre-filled bug report template to be slurped into their MUA and then
> further manually edited before sending it out?
>
> It probably is controversial if we exposed contents of hooks scripts
> (I can imagine some people may trigger external process by pinging a
> URL that includes credential material out of laziness), so the
> presence test you have is probably a good stopping point.  I do not
> know how much it helps to know which hooks exist in order to
> diagnose an anomaly without knowing what their contents are, but it
> is a start.
>
> By the way, I doubt that it is the best use of developer time to
> write these in C---taking various pieces of information from
> different places to prepare a text file sounds more suited to
> scripting languages, especially while we are still learning what
> kind of information we want to collect and how we want to present
> the final result (iow, for the first handful of years after
> deploying this command).

If you want to limit Git to Linux, then relying on a specific scripting
language like Unix shell scripting sounds okay.

If you target more platforms, then a range of scripting languages sounds
a lot more sensible, my top choice would be node.js.

I understand that that is not an option here, and there is a strong
preference for Unix shell scripting in Git. This is a bit unfortunate:

A good chunk of the bug reports I see in Git for Windows are _about_
problems with Unix shell scripting. There are fork problems, problems
where `add_item` failed (which is one of the least helpful error
messages I saw in my entire life), there are problems with anti-malware
causing segmentation faults in the Unix shell script interpreter (the
MSYS2 Bash), and there are even (believe it or not) problems with
_display drivers_. Yes, that is right, display drivers can interfere
with Unix shell script execution on Windows. (This is really rare, but
hey, bug reports are rare in Git for Windows to begin with...)

All of these problems _prevent_ Unix shell scripts from being
interpreted as expected on Windows. Which would ridicule the idea of
reporting any of these bugs using a tool that is implemented in Unix
shell.

And what is worse than a bug in an application? A bug in the
application's very own bug reporting tool.

I pointed this out to Emily, and that is the reason why this amount of
work was put in to _improve_ the tool, precisely by _not_ implementing
it as a Unix shell script, but instead in plain C (which is a _lot_ more
robust).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
@ 2019-10-29 20:29       ` Josh Steadmon
  2019-11-16  3:11       ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Josh Steadmon @ 2019-10-29 20:29 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On 2019.10.24 19:51, Emily Shaffer wrote:
> Teach Git how to prompt the user for a good bug report: reproduction
> steps, expected behavior, and actual behavior. Later, Git can learn how
> to collect some diagnostic information from the repository.
> 
> If users can send us a well-written bug report which contains diagnostic
> information we would otherwise need to ask the user for, we can reduce
> the number of question-and-answer round trips between the reporter and
> the Git contributor.
> 
> Users may also wish to send a report like this to their local "Git
> expert" if they have put their repository into a state they are confused
> by.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  Makefile            |  1 +
>  builtin.h           |  1 +
>  builtin/bugreport.c | 50 +++++++++++++++++++++++++++++++++++++++++++++
>  git.c               |  1 +
>  4 files changed, 53 insertions(+)
>  create mode 100644 builtin/bugreport.c
> 
> diff --git a/Makefile b/Makefile
> index 58b92af54b..132e2a52da 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1039,6 +1039,7 @@ BUILTIN_OBJS += builtin/archive.o
>  BUILTIN_OBJS += builtin/bisect--helper.o
>  BUILTIN_OBJS += builtin/blame.o
>  BUILTIN_OBJS += builtin/branch.o
> +BUILTIN_OBJS += builtin/bugreport.o
>  BUILTIN_OBJS += builtin/bundle.o
>  BUILTIN_OBJS += builtin/cat-file.o
>  BUILTIN_OBJS += builtin/check-attr.o
> diff --git a/builtin.h b/builtin.h
> index 5cf5df69f7..c6373d3289 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -135,6 +135,7 @@ int cmd_archive(int argc, const char **argv, const char *prefix);
>  int cmd_bisect__helper(int argc, const char **argv, const char *prefix);
>  int cmd_blame(int argc, const char **argv, const char *prefix);
>  int cmd_branch(int argc, const char **argv, const char *prefix);
> +int cmd_bugreport(int argc, const char **argv, const char *prefix);
>  int cmd_bundle(int argc, const char **argv, const char *prefix);
>  int cmd_cat_file(int argc, const char **argv, const char *prefix);
>  int cmd_checkout(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> new file mode 100644
> index 0000000000..2ef16440a0
> --- /dev/null
> +++ b/builtin/bugreport.c
> @@ -0,0 +1,50 @@
> +#include "builtin.h"
> +#include "stdio.h"
> +#include "strbuf.h"
> +#include "time.h"
> +
> +int get_bug_template(struct strbuf *template)

Compilation fails here for me with:
  builtin/bugreport.c:6:5: error: no previous prototype
      for ‘get_bug_template’ [-Werror=missing-prototypes]

Can you make this function static?


> +{
> +	const char template_text[] =
> +"Thank you for filling out a Git bug report!\n"
> +"Please answer the following questions to help us understand your issue.\n"
> +"\n"
> +"What did you do before the bug happened? (Steps to reproduce your issue)\n"
> +"\n"
> +"What did you expect to happen? (Expected behavior)\n"
> +"\n"
> +"What happened instead? (Actual behavior)\n"
> +"\n"
> +"What's different between what you expected and what actually happened?\n"
> +"\n"
> +"Anything else you want to add:\n"
> +"\n"
> +"Please review the rest of the bug report below.\n"
> +"You can delete any lines you don't wish to send.\n";
> +
> +	strbuf_reset(template);
> +	strbuf_add(template, template_text, strlen(template_text));
> +	return 0;
> +}
> +
> +int cmd_bugreport(int argc, const char **argv, const char *prefix)
> +{
> +	struct strbuf buffer = STRBUF_INIT;
> +	struct strbuf report_path = STRBUF_INIT;
> +	FILE *report;
> +	time_t now = time(NULL);
> +
> +	strbuf_addstr(&report_path, "git-bugreport-");
> +	strbuf_addftime(&report_path, "%F", gmtime(&now), 0, 0);
> +	strbuf_addstr(&report_path, ".txt");
> +
> +	report = fopen_for_writing(report_path.buf);
> +
> +	get_bug_template(&buffer);
> +	strbuf_write(&buffer, report);
> +
> +	fclose(report);
> +
> +	launch_editor(report_path.buf, NULL, NULL);
> +	return 0;
> +}
> diff --git a/git.c b/git.c
> index ce6ab0ece2..2d6a64f019 100644
> --- a/git.c
> +++ b/git.c
> @@ -473,6 +473,7 @@ static struct cmd_struct commands[] = {
>  	{ "bisect--helper", cmd_bisect__helper, RUN_SETUP },
>  	{ "blame", cmd_blame, RUN_SETUP },
>  	{ "branch", cmd_branch, RUN_SETUP | DELAY_PAGER_CONFIG },
> +	{ "bugreport", cmd_bugreport, RUN_SETUP },
>  	{ "bundle", cmd_bundle, RUN_SETUP_GENTLY | NO_PARSEOPT },
>  	{ "cat-file", cmd_cat_file, RUN_SETUP },
>  	{ "check-attr", cmd_check_attr, RUN_SETUP },
> -- 
> 2.24.0.rc0.303.g954a862665-goog

Can you also add /git-bugreport to .gitignore?


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 3/9] bugreport: add version and system information
  2019-10-25  2:51     ` [PATCH v3 3/9] bugreport: add version and system information Emily Shaffer
  2019-10-28 13:49       ` Johannes Schindelin
@ 2019-10-29 20:43       ` Josh Steadmon
  1 sibling, 0 replies; 68+ messages in thread
From: Josh Steadmon @ 2019-10-29 20:43 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On 2019.10.24 19:51, Emily Shaffer wrote:
[snip]
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 2ef16440a0..7232d31be7 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -1,4 +1,5 @@
>  #include "builtin.h"
> +#include "bugreport.h"
>  #include "stdio.h"
>  #include "strbuf.h"
>  #include "time.h"
> @@ -27,6 +28,13 @@ int get_bug_template(struct strbuf *template)
>  	return 0;
>  }
>  
> +void add_header(FILE *report, const char *title)

I get the same compilation error here, can you make add_header() static
as well?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-10-25  2:51     ` [PATCH v3 4/9] bugreport: add config values from whitelist Emily Shaffer
  2019-10-28 14:14       ` Johannes Schindelin
@ 2019-10-29 20:58       ` Josh Steadmon
  2019-10-30  1:37         ` Junio C Hamano
  1 sibling, 1 reply; 68+ messages in thread
From: Josh Steadmon @ 2019-10-29 20:58 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On 2019.10.24 19:51, Emily Shaffer wrote:
[snip]
> diff --git a/bugreport.c b/bugreport.c
> index ada54fe583..afa4836ab1 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -1,10 +1,24 @@
>  #include "cache.h"
>  
>  #include "bugreport.h"
> +#include "config.h"
> +#include "exec-cmd.h"
>  #include "help.h"
>  #include "run-command.h"
>  #include "version.h"
>  
> +/**
> + * A sorted list of config options which we will add to the bugreport. Managed
> + * by 'gather_whitelist(...)'.
> + */
> +struct string_list whitelist = STRING_LIST_INIT_DUP;
> +struct strbuf configs_and_values = STRBUF_INIT;
> +
> +// git version --build-options
> +// uname -a
> +// curl-config --version
> +// ldd --version
> +// echo $SHELL
>  void get_system_info(struct strbuf *sys_info)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
>  	argv_array_clear(&cp.args);
>  	strbuf_reset(&std_out);
>  }
> +
> +void gather_whitelist(struct strbuf *path)

This and git_config_bugreport() below should both be static as well.
Rather than repeating advice on the later patches, I'll just note that
any new functions that don't show up in the corresponding .h file should
be marked static.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 6/9] bugreport: count loose objects
  2019-10-25  2:51     ` [PATCH v3 6/9] bugreport: count loose objects Emily Shaffer
  2019-10-28 15:07       ` Johannes Schindelin
@ 2019-10-29 21:18       ` Josh Steadmon
  1 sibling, 0 replies; 68+ messages in thread
From: Josh Steadmon @ 2019-10-29 21:18 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On 2019.10.24 19:51, Emily Shaffer wrote:
[snip]
> diff --git a/bugreport.h b/bugreport.h
> index 942a5436e3..09ad0c2599 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -18,3 +18,9 @@ void get_whitelisted_config(struct strbuf *sys_info);
>   * contents of hook_info will be discarded.
>   */
>  void get_populated_hooks(struct strbuf *hook_info);
> +
> +/**
> + * Adds the output of `git count-object -vH`. The previous contents of hook_info
> + * will be discarded.
> + */
> +void get_loose_object_summary(struct strbuf *obj_info);

Looks like a copy/paste typo here, shouldn't "hook_info" be "obj_info"
in the comment?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-10-29 20:58       ` Josh Steadmon
@ 2019-10-30  1:37         ` Junio C Hamano
  2019-11-14 21:55           ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-10-30  1:37 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Emily Shaffer, git

Josh Steadmon <steadmon@google.com> writes:

> This and git_config_bugreport() below should both be static as well.
> Rather than repeating advice on the later patches, I'll just note that
> any new functions that don't show up in the corresponding .h file should
> be marked static.

Good advice.  

More importantly, given that "git bugreport" itself has no service
of itself to offer other existing parts of Git (it is just a "gather
various pieces of information from different places, and then
produce a text file output" application), I do not see much point in
it having its own header file that others would #include (i.e. the
include file is to define services that are offered by it).  If
there are common enough service routines invented to support the
need of bugreport.c (e.g. perhaps it wants to give more info than
what is currently available via the existing API on the contents of
in-core index), by definition of being them common enough, they
should be added to the header that can be used by both bugreport.c
and other existing users of the same subsystem (e.g. if it is about
in-core index, perhaps cache.h).

It makes perfect sense for bugreport.c to #include header files for
the Git internals to collect pieces of information from inside Git,
though.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 3/9] bugreport: add version and system information
  2019-10-28 13:49       ` Johannes Schindelin
@ 2019-11-08 21:48         ` Emily Shaffer
  2019-11-11 13:48           ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-11-08 21:48 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Oct 28, 2019 at 02:49:29PM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > diff --git a/bugreport.c b/bugreport.c
> > new file mode 100644
> > index 0000000000..ada54fe583
> > --- /dev/null
> > +++ b/bugreport.c
> > @@ -0,0 +1,55 @@
> > +#include "cache.h"
> > +
> > +#include "bugreport.h"
> > +#include "help.h"
> > +#include "run-command.h"
> > +#include "version.h"
> > +
> > +void get_system_info(struct strbuf *sys_info)
> > +{
> > +	struct child_process cp = CHILD_PROCESS_INIT;
> > +	struct strbuf std_out = STRBUF_INIT;
> > +
> > +	strbuf_reset(sys_info);
> > +
> > +	// get git version from native cmd
> 
> Please use C-style comments instead of C++ ones.
> 
> > +	strbuf_addstr(sys_info, "git version: ");
> > +	strbuf_addstr(sys_info, git_version_string);
> > +	strbuf_complete_line(sys_info);
> > +
> > +	// system call for other version info
> > +	argv_array_push(&cp.args, "uname");
> > +	argv_array_push(&cp.args, "-a");
> > +	capture_command(&cp, &std_out, 0);
> 
> Mmmkay. I am not too much of a fan of relying on the `uname` executable,
> as it can very well be a 32-bit `uname.exe` on Windows, which obviously
> would _not_ report the architecture of the machine, but something
> misleading.
> 
> Why not use the `uname()` function (that we even define in
> `compat/mingw.c`)?

Very glad to have your review and point this kind of thing out. :) It
simplifies the code, too. I wasn't sure about these system calls, so I
appreciate you suggesting alternatives.

> 
> Also, why not include the same information as `git version
> --build-options`, most importantly `GIT_HOST_CPU`?
> 
> In any case, if you are capturing the output of `uname -a`, you
> definitely will want to pass the `RUN_COMMAND_STDOUT_TO_STDERR` flag (in
> case, say, the MSYS2 `uname.exe` fails with those dreaded Cygwin error
> messages like "*** fatal error - add_item
> ("\??\D:\git\installation\Git", "/", ...) failed, errno 1").
> 
> > +
> > +	strbuf_addstr(sys_info, "uname -a: ");
> > +	strbuf_addbuf(sys_info, &std_out);
> > +	strbuf_complete_line(sys_info);
> > +
> > +	argv_array_clear(&cp.args);
> > +	strbuf_reset(&std_out);
> > +
> > +
> > +	argv_array_push(&cp.args, "curl-config");
> > +	argv_array_push(&cp.args, "--version");
> > +	capture_command(&cp, &std_out, 0);
> 
> This will be almost certainly be incorrect, as most _regular_ users
> won't have `curl-config`. I know, it is easy to forget that most Git
> users are no hard-core C developers ;-)

Heresy! :)

> 
> A better alternative would be to use `curl_version()`, guarded, of
> course, by `#ifndef NO_CURL`...
> 
> > +
> > +	strbuf_addstr(sys_info, "curl-config --version: ");
> > +	strbuf_addbuf(sys_info, &std_out);
> > +	strbuf_complete_line(sys_info);
> > +
> > +	argv_array_clear(&cp.args);
> > +	strbuf_reset(&std_out);
> > +
> > +
> > +	argv_array_push(&cp.args, "ldd");
> > +	argv_array_push(&cp.args, "--version");
> > +	capture_command(&cp, &std_out, 0);
> 
> Again, this command will only be present in few setups. I am not
> actually sure that the output of this is interesting to begin with.

It was a suggestion, I believe, from Jonathan Nieder.

> 
> What I _do_ think is that a much more interesting piece of information
> would be the exact GLIBC version, the OS name and the OS version.

The glibc version is easy; I've done that. It certainly looks nicer than
the ldd call.

I guess I may be missing something, because as I start to look into how
to the OS info, I fall down a hole of many, many preprocessor defines to
check. If that's the approach you want me to take, just say the word,
but it will be ugly :) I suppose I had hoped the uname info would give us
a close enough idea that full OS info isn't necessary.

> 
> > +
> > +	strbuf_addstr(sys_info, "ldd --version: ");
> > +	strbuf_addbuf(sys_info, &std_out);
> > +	strbuf_complete_line(sys_info);
> > +
> > +	argv_array_clear(&cp.args);
> > +	strbuf_reset(&std_out);
> > +}
> > diff --git a/bugreport.h b/bugreport.h
> > new file mode 100644
> > index 0000000000..ba216acf3f
> > --- /dev/null
> > +++ b/bugreport.h
> > @@ -0,0 +1,7 @@
> > +#include "strbuf.h"
> > +
> > +/**
> > + * Adds the Git version, `uname -a`, and `curl-config --version` to sys_info.
> > + * The previous contents of sys_info will be discarded.
> > + */
> > +void get_system_info(struct strbuf *sys_info);
> > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > index 2ef16440a0..7232d31be7 100644
> > --- a/builtin/bugreport.c
> > +++ b/builtin/bugreport.c
> > @@ -1,4 +1,5 @@
> >  #include "builtin.h"
> > +#include "bugreport.h"
> >  #include "stdio.h"
> >  #include "strbuf.h"
> >  #include "time.h"
> > @@ -27,6 +28,13 @@ int get_bug_template(struct strbuf *template)
> >  	return 0;
> >  }
> >
> > +void add_header(FILE *report, const char *title)
> > +{
> > +	struct strbuf buffer = STRBUF_INIT;
> > +	strbuf_addf(&buffer, "\n\n[%s]\n", title);
> > +	strbuf_write(&buffer, report);
> 
> This leaks `buffer`. Why not write into `report` via `fprintf()`
> directly?

Rather, to match the style of the rest of the builtin, modified
get_header to add the header to a passed-in strbuf instead of
modifying the file directly.

> 
> Ciao,
> Dscho
> 
> > +}
> > +
> >  int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  {
> >  	struct strbuf buffer = STRBUF_INIT;
> > @@ -43,6 +51,11 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  	get_bug_template(&buffer);
> >  	strbuf_write(&buffer, report);
> >
> > +	// add other contents
> > +	add_header(report, "System Info");
> > +	get_system_info(&buffer);
> > +	strbuf_write(&buffer, report);
> > +
> >  	fclose(report);
> >
> >  	launch_editor(report_path.buf, NULL, NULL);
> > --
> > 2.24.0.rc0.303.g954a862665-goog
> >
> >


Based on Dscho's comments, each piece of system info is gathered
differently enough that I will do each in an independent commit now.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 3/9] bugreport: add version and system information
  2019-11-08 21:48         ` Emily Shaffer
@ 2019-11-11 13:48           ` Johannes Schindelin
  2019-11-14 21:42             ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-11-11 13:48 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Fri, 8 Nov 2019, Emily Shaffer wrote:

> On Mon, Oct 28, 2019 at 02:49:29PM +0100, Johannes Schindelin wrote:
>
> > On Thu, 24 Oct 2019, Emily Shaffer wrote:
> >
> > > diff --git a/bugreport.c b/bugreport.c
> > > new file mode 100644
> > > index 0000000000..ada54fe583
> > > --- /dev/null
> > > +++ b/bugreport.c
> > > [...]
> > > +	strbuf_addstr(sys_info, "curl-config --version: ");
> > > +	strbuf_addbuf(sys_info, &std_out);
> > > +	strbuf_complete_line(sys_info);
> > > +
> > > +	argv_array_clear(&cp.args);
> > > +	strbuf_reset(&std_out);
> > > +
> > > +
> > > +	argv_array_push(&cp.args, "ldd");
> > > +	argv_array_push(&cp.args, "--version");
> > > +	capture_command(&cp, &std_out, 0);
> >
> > Again, this command will only be present in few setups. I am not
> > actually sure that the output of this is interesting to begin with.
>
> It was a suggestion, I believe, from Jonathan Nieder.

Yes, I guess Jonathan builds their Git locally, too.

It _is_ easy to forget that most users find this too involved to even
try.

Nothing like reading through a bug tracker quite frequently to learn
about the actual troubles actual users have :-)

> > What I _do_ think is that a much more interesting piece of information
> > would be the exact GLIBC version, the OS name and the OS version.
>
> The glibc version is easy; I've done that. It certainly looks nicer than
> the ldd call.
>
> I guess I may be missing something, because as I start to look into how
> to the OS info, I fall down a hole of many, many preprocessor defines to
> check. If that's the approach you want me to take, just say the word,
> but it will be ugly :) I suppose I had hoped the uname info would give us
> a close enough idea that full OS info isn't necessary.

We could go down the pre-processor route, but that would give us the OS
name and version at build time, not at run time. I think we will be
mostly interested in the latter, though.

We might need to enhance our `uname()` emulation in `compat/mingw.c`,
but I think we already have enough information there.

When it comes to glibc, I think `gnu_get_libc_version()` would get us
what we want. A trickier thing might be to determine whether we're
actually linking against glibc; I do not want to break musl builds
again, I already did that inadvertently when requiring `REG_STARTEND`
back in the days.

> > > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > > index 2ef16440a0..7232d31be7 100644
> > > --- a/builtin/bugreport.c
> > > +++ b/builtin/bugreport.c
> > > @@ -1,4 +1,5 @@
> > >  #include "builtin.h"
> > > +#include "bugreport.h"
> > >  #include "stdio.h"
> > >  #include "strbuf.h"
> > >  #include "time.h"
> > > @@ -27,6 +28,13 @@ int get_bug_template(struct strbuf *template)
> > >  	return 0;
> > >  }
> > >
> > > +void add_header(FILE *report, const char *title)
> > > +{
> > > +	struct strbuf buffer = STRBUF_INIT;
> > > +	strbuf_addf(&buffer, "\n\n[%s]\n", title);
> > > +	strbuf_write(&buffer, report);
> >
> > This leaks `buffer`. Why not write into `report` via `fprintf()`
> > directly?
>
> Rather, to match the style of the rest of the builtin, modified
> get_header to add the header to a passed-in strbuf instead of
> modifying the file directly.

Hmm. It makes the code less elegant in my opinion. I would rather either
render the entire bug report into a single `strbuf` and then write it
via `write_in_full()`, or use `fprintf()` directly.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 3/9] bugreport: add version and system information
  2019-11-11 13:48           ` Johannes Schindelin
@ 2019-11-14 21:42             ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-11-14 21:42 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Nov 11, 2019 at 02:48:13PM +0100, Johannes Schindelin wrote:
> > > > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > > > index 2ef16440a0..7232d31be7 100644
> > > > --- a/builtin/bugreport.c
> > > > +++ b/builtin/bugreport.c
> > > > @@ -1,4 +1,5 @@
> > > >  #include "builtin.h"
> > > > +#include "bugreport.h"
> > > >  #include "stdio.h"
> > > >  #include "strbuf.h"
> > > >  #include "time.h"
> > > > @@ -27,6 +28,13 @@ int get_bug_template(struct strbuf *template)
> > > >  	return 0;
> > > >  }
> > > >
> > > > +void add_header(FILE *report, const char *title)
> > > > +{
> > > > +	struct strbuf buffer = STRBUF_INIT;
> > > > +	strbuf_addf(&buffer, "\n\n[%s]\n", title);
> > > > +	strbuf_write(&buffer, report);
> > >
> > > This leaks `buffer`. Why not write into `report` via `fprintf()`
> > > directly?
> >
> > Rather, to match the style of the rest of the builtin, modified
> > get_header to add the header to a passed-in strbuf instead of
> > modifying the file directly.
> 
> Hmm. It makes the code less elegant in my opinion. I would rather either
> render the entire bug report into a single `strbuf` and then write it
> via `write_in_full()`, or use `fprintf()` directly.

Yeah, that sounds good. I will do that. :)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-10-30  1:37         ` Junio C Hamano
@ 2019-11-14 21:55           ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-11-14 21:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git

On Wed, Oct 30, 2019 at 10:37:13AM +0900, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > This and git_config_bugreport() below should both be static as well.
> > Rather than repeating advice on the later patches, I'll just note that
> > any new functions that don't show up in the corresponding .h file should
> > be marked static.
> 
> Good advice.  
> 
> More importantly, given that "git bugreport" itself has no service
> of itself to offer other existing parts of Git (it is just a "gather
> various pieces of information from different places, and then
> produce a text file output" application), I do not see much point in
> it having its own header file that others would #include (i.e. the
> include file is to define services that are offered by it).  If
> there are common enough service routines invented to support the
> need of bugreport.c (e.g. perhaps it wants to give more info than
> what is currently available via the existing API on the contents of
> in-core index), by definition of being them common enough, they
> should be added to the header that can be used by both bugreport.c
> and other existing users of the same subsystem (e.g. if it is about
> in-core index, perhaps cache.h).

Are you asking me to have only builtin/bugreport.c and implement each
function there, and eliminate both <basedir>/bugreport.[ch]?

It may be true that deciding to put this functionality into a library
"in case someone needs it later" was premature, so I can do that - I
just want to make sure I understand you.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
  2019-10-29 20:29       ` Josh Steadmon
@ 2019-11-16  3:11       ` Junio C Hamano
  2019-11-19 20:25         ` Emily Shaffer
  1 sibling, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-11-16  3:11 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> Teach Git how to prompt the user for a good bug report: reproduction
> steps, expected behavior, and actual behavior. Later, Git can learn how
> to collect some diagnostic information from the repository.

It makes sense, but I do not think of any good reason why this
should be implemented as a builtin.  I'd expect it would probably
need to collect more info on the running environment than otherwise
necessary for the regular Git operation, and perhaps you'd want to
even link with libraries that are not needed for the regular Git
operation to achieve that.

Can you make it a standalone binary instead to avoid bloat of the
main "git" binary?

Thanks.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-16  3:11       ` Junio C Hamano
@ 2019-11-19 20:25         ` Emily Shaffer
  2019-11-19 23:24           ` Johannes Schindelin
                             ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-11-19 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Nov 16, 2019 at 12:11:38PM +0900, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Teach Git how to prompt the user for a good bug report: reproduction
> > steps, expected behavior, and actual behavior. Later, Git can learn how
> > to collect some diagnostic information from the repository.
> 
> It makes sense, but I do not think of any good reason why this
> should be implemented as a builtin.  I'd expect it would probably
> need to collect more info on the running environment than otherwise
> necessary for the regular Git operation, and perhaps you'd want to
> even link with libraries that are not needed for the regular Git
> operation to achieve that.
> 
> Can you make it a standalone binary instead to avoid bloat of the
> main "git" binary?

Sure. This would fix some other issues (needing to link against curl to
get the curl version, which is exactly what you implied). I wasn't
certain which circumstances a standalone binary was preferred, but I
agree with your reasoning here for sure.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 9/9] bugreport: print contents of alternates file
  2019-10-28 15:57       ` Johannes Schindelin
@ 2019-11-19 20:40         ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-11-19 20:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Oct 28, 2019 at 04:57:01PM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > In some cases, it could be that the user is having a problem with an
> > object which isn't present in their normal object directory. We can get
> > a hint that that might be the case by examining the list of alternates
> > where their object may be stored instead.
> 
> Doesn't this open the possibility of leaking project's (possibly NDA'ed) names?
> 
> I could imagine that we might rather want to count the alternates, and
> maybe separate into those alternates that actually exist and alternates
> that do not exist (which would produce a warning that the user might
> have trained themselves to ignore).

Sounds reasonable. Will do.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-19 20:25         ` Emily Shaffer
@ 2019-11-19 23:24           ` Johannes Schindelin
  2019-11-20  0:37             ` Junio C Hamano
  2019-11-19 23:31           ` Johannes Schindelin
  2019-11-20  0:32           ` Junio C Hamano
  2 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-11-19 23:24 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Junio C Hamano, git

Hi,

On Tue, 19 Nov 2019, Emily Shaffer wrote:

> On Sat, Nov 16, 2019 at 12:11:38PM +0900, Junio C Hamano wrote:
> > Emily Shaffer <emilyshaffer@google.com> writes:
> >
> > > Teach Git how to prompt the user for a good bug report: reproduction
> > > steps, expected behavior, and actual behavior. Later, Git can learn how
> > > to collect some diagnostic information from the repository.
> >
> > It makes sense, but I do not think of any good reason why this
> > should be implemented as a builtin.  I'd expect it would probably
> > need to collect more info on the running environment than otherwise
> > necessary for the regular Git operation, and perhaps you'd want to
> > even link with libraries that are not needed for the regular Git
> > operation to achieve that.
> >
> > Can you make it a standalone binary instead to avoid bloat of the
> > main "git" binary?
>
> Sure. This would fix some other issues (needing to link against curl to
> get the curl version, which is exactly what you implied). I wasn't
> certain which circumstances a standalone binary was preferred, but I
> agree with your reasoning here for sure.

FWIW I disagree with the idea that a tiny built-in command like
`bugreport` would "bloat" the main `git` binary.

In contrast, I think that stand-alone commands _do_ bloat. Look here:

$ ls -lh git-daemon.exe git-credential-store.exe
-rwxr-xr-x 1 me 4096 1.8M Nov  6 13:43 git-credential-store.exe*
-rwxr-xr-x 1 me 4096 1.8M Nov  6 13:43 git-daemon.exe*

In other words, even a super simple stand-alone like `credential-store`
(the `credential-store.c` file has only 198 lines!) weighs in with almost
two megabytes.

So I fear that the claim that a stand-alone command would add less bloat
than a built-in one, especially for a relatively small thing like
`bugreport` has it exactly backwards.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-19 20:25         ` Emily Shaffer
  2019-11-19 23:24           ` Johannes Schindelin
@ 2019-11-19 23:31           ` Johannes Schindelin
  2019-11-20  0:39             ` Junio C Hamano
  2019-11-20  2:09             ` Emily Shaffer
  2019-11-20  0:32           ` Junio C Hamano
  2 siblings, 2 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-11-19 23:31 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Junio C Hamano, git

Hi Emily,

On Tue, 19 Nov 2019, Emily Shaffer wrote:

> [...] some other issues (needing to link against curl to get the curl
> version, which is exactly what you implied) [...]

I did suggest on IRC to teach `git-remote-https` an option where it prints
the cURL version (and build options) and exits.

This would have the further advantage of making sure that the correct
information is included. If you have two separate binaries that both link
to cURL, they could still be linked statically, to different versions of
cURL (this could happen e.g. if you have a `git-remote-https` from a
different build in your path).

In short: I still think that it would make for a much better idea to query
the `git-remote-https` executable for the version information than to link
`bugreport` to libcurl.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-19 20:25         ` Emily Shaffer
  2019-11-19 23:24           ` Johannes Schindelin
  2019-11-19 23:31           ` Johannes Schindelin
@ 2019-11-20  0:32           ` Junio C Hamano
  2 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2019-11-20  0:32 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

>> Can you make it a standalone binary instead to avoid bloat of the
>> main "git" binary?
>
> Sure. This would fix some other issues (needing to link against curl to
> get the curl version, which is exactly what you implied).

Hmph, I actually was not thinking about the cURL library.  

I imagined that your tool may need to learn more about the system in
order to make the report and for that there may be libraries that
makes it easy than say reading directly from the /proc filesystem
etc. that you may end up using.  Unlike cURL, such a library would
have no use in the rest of Git anywhere.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-19 23:24           ` Johannes Schindelin
@ 2019-11-20  0:37             ` Junio C Hamano
  2019-11-20 10:51               ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2019-11-20  0:37 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> FWIW I disagree with the idea that a tiny built-in command like
> `bugreport` would "bloat" the main `git` binary.
>
> In contrast, I think that stand-alone commands _do_ bloat. Look here:

I probably should have clarified that "bloat" in the context of this
discussion is not about on-disk space.  It is about resources needed
to run "git status/diff/etc" that are everyday local commands that
are expected to be lightweight, i.e. the same criteria applied when
making the networking part (which the user expects to be coffee time)
separate from them.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-19 23:31           ` Johannes Schindelin
@ 2019-11-20  0:39             ` Junio C Hamano
  2019-11-20  2:09             ` Emily Shaffer
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2019-11-20  0:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I did suggest on IRC to teach `git-remote-https` an option where it prints
> the cURL version (and build options) and exits.

I like that.  You ask the exact binary what (it thinks) it uses, so
that there won't be skew between the view by "git remote-http" and
"git bugreport" on the world.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-19 23:31           ` Johannes Schindelin
  2019-11-20  0:39             ` Junio C Hamano
@ 2019-11-20  2:09             ` Emily Shaffer
  1 sibling, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-11-20  2:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Wed, Nov 20, 2019 at 12:31:42AM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Tue, 19 Nov 2019, Emily Shaffer wrote:
> 
> > [...] some other issues (needing to link against curl to get the curl
> > version, which is exactly what you implied) [...]
> 
> I did suggest on IRC to teach `git-remote-https` an option where it prints
> the cURL version (and build options) and exits.
> 
> This would have the further advantage of making sure that the correct
> information is included. If you have two separate binaries that both link
> to cURL, they could still be linked statically, to different versions of
> cURL (this could happen e.g. if you have a `git-remote-https` from a
> different build in your path).

Yeah, it's a good point and an angle I hadn't thought of. Thanks.

> In short: I still think that it would make for a much better idea to query
> the `git-remote-https` executable for the version information than to link
> `bugreport` to libcurl.

Will do - the git-bugreport standalone will invoke the git-remote-https
standalone to ask for version info.

 - Emily

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/9] bugreport: add tool to generate debugging info
  2019-11-20  0:37             ` Junio C Hamano
@ 2019-11-20 10:51               ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-11-20 10:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emily Shaffer, git

Hi Junio,

On Wed, 20 Nov 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > FWIW I disagree with the idea that a tiny built-in command like
> > `bugreport` would "bloat" the main `git` binary.
> >
> > In contrast, I think that stand-alone commands _do_ bloat. Look here:
>
> I probably should have clarified that "bloat" in the context of this
> discussion is not about on-disk space.  It is about resources needed
> to run "git status/diff/etc" that are everyday local commands that
> are expected to be lightweight, i.e. the same criteria applied when
> making the networking part (which the user expects to be coffee time)
> separate from them.

I guess I still do not understand.

Are you suggesting that `bugreport` would be unwelcome as a built-in? If
so, I would really like to know why. Because I think it would make for a
very fine built-in. I interact with users all the time, and a good tool to
provide good information to accompany a bug report is definitely something
that would improve the current situation.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 6/9] bugreport: count loose objects
  2019-10-28 15:07       ` Johannes Schindelin
@ 2019-12-10 22:34         ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-12-10 22:34 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Oct 28, 2019 at 04:07:40PM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > The number of unpacked objects in a user's repository may help us
> > understand the root of the problem they're seeing, especially if a
> > command is running unusually slowly.
> >
> > Rather than directly invoking 'git-count-objects', which may sometimes
> > fail unexpectedly on Git for Windows, manually count the contents of
> > .git/objects. Additionally, since we may wish to inspect other
> > directories' contents for bugreport in the future, put the directory
> > listing into a helper function.
> 
> Thank you, much appreciated!
> 
> I guess the next step is to count the number of packs, and the number of
> submodules ;-)
> 
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> >  bugreport.c         | 72 +++++++++++++++++++++++++++++++++++++++++++++
> >  bugreport.h         |  6 ++++
> >  builtin/bugreport.c |  4 +++
> >  3 files changed, 82 insertions(+)
> >
> > diff --git a/bugreport.c b/bugreport.c
> > index 9d7f44ff28..54e1d47103 100644
> > --- a/bugreport.c
> > +++ b/bugreport.c
> > @@ -5,8 +5,11 @@
> >  #include "exec-cmd.h"
> >  #include "help.h"
> >  #include "run-command.h"
> > +#include "strbuf.h"
> 
> Why not append this to the end of the `#include` list, as is common in
> Git's commit history?
> 
> >  #include "version.h"
> >
> > +#include "dirent.h"
> 
> This header (although with pointy brackets instead of double quotes) is
> already included in `git-compat-util.h`
> 
> > +
> >  /**
> >   * A sorted list of config options which we will add to the bugreport. Managed
> >   * by 'gather_whitelist(...)'.
> > @@ -147,3 +150,72 @@ void get_populated_hooks(struct strbuf *hook_info)
> >  		}
> >  	}
> >  }
> > +
> > +/**
> > + * Fill 'contents' with the contents of the dir at 'dirpath'.
> 
> Since you start this comment in JavaDoc style, there should be an almost
> empty line after this one ("almost" because it still contains the
> asterisk, of course).
> 
> > + * If 'filter' is nonzero, the contents are filtered on d_type as 'type' - see
> > + * 'man readdir'. opendir() doesn't take string length as an arg, so don't
> > + * bother passing it in.
> > + */
> > +void list_contents_of_dir(struct string_list *contents, struct strbuf *dirpath,
> 
> Shouldn't this be `static`?
> 
> > +			  int filter, unsigned char type)
> > +{
> > +	struct dirent *dir = NULL;
> > +	DIR *dh = NULL;
> > +
> > +	dh = opendir(dirpath->buf);
> > +	while (dh && (dir = readdir(dh))) {
> > +		if (!filter || type == dir->d_type) {
> > +			string_list_append(contents, dir->d_name);
> > +		}
> > +	}
> > +}
> > +
> > +
> > +void get_object_counts(struct strbuf *obj_info)
> 
> Oops. This function is no longer used.
> 
> > +{
> > +	struct child_process cp = CHILD_PROCESS_INIT;
> > +	struct strbuf std_out = STRBUF_INIT;
> > +
> > +	argv_array_push(&cp.args, "count-objects");
> > +	argv_array_push(&cp.args, "-vH");
> > +	cp.git_cmd = 1;
> > +	capture_command(&cp, &std_out, 0);
> > +
> > +	strbuf_reset(obj_info);
> > +	strbuf_addstr(obj_info, "git-count-objects -vH:\n");
> > +	strbuf_addbuf(obj_info, &std_out);
> > +}
> > +
> > +void get_loose_object_summary(struct strbuf *obj_info)
> > +{
> > +	struct strbuf dirpath = STRBUF_INIT;
> > +	struct string_list subdirs = STRING_LIST_INIT_DUP;
> > +	struct string_list_item *subdir;
> > +
> > +	strbuf_reset(obj_info);
> > +
> > +	strbuf_addstr(&dirpath, get_object_directory());
> > +	strbuf_complete(&dirpath, '/');
> > +
> > +	list_contents_of_dir(&subdirs, &dirpath, 1, DT_DIR);
> > +
> > +	for_each_string_list_item(subdir, &subdirs)
> > +	{
> > +		struct strbuf subdir_buf = STRBUF_INIT;
> > +		struct string_list objects = STRING_LIST_INIT_DUP;
> > +
> > +		/*
> > +		 * Only interested in loose objects - so dirs named with the
> > +		 * first byte of the object ID
> > +		 */
> > +		if (strlen(subdir->string) != 2 || !strcmp(subdir->string, ".."))
> > +			continue;
> > +
> > +		strbuf_addbuf(&subdir_buf, &dirpath);
> > +		strbuf_addstr(&subdir_buf, subdir->string);
> > +		list_contents_of_dir(&objects, &subdir_buf, 0, 0);
> > +		strbuf_addf(obj_info, "%s: %d objects\n", subdir->string,
> > +			    objects.nr);
> 
> Hmm. Not only does this leak `objects`, it also throws away the contents
> that we so painfully constructed.
> 
> Wouldn't it make more sense to do something like this instead?
> 
> static int is_hex(const char *string, size_t count)
> {
> 	for (; count; string++, count--)
> 		if (hexval(*string) < 0)
> 			return 0;
> 	return 1;
> }

True if the string matches [0-9a-fA-F]*, false otherwise.

> 
> static ssize_t count_loose_objects(struct strbuf *objects_path)
> {
> 	ssize_t ret = 0;
> 	size_t len;
> 	struct dirent *d;
> 	DIR *dir, *subdir;
> 
> 	dir = opendir(objects_path->buf);
> 	if (!dir)
> 		return -1;
> 
> 	strbuf_complete(objects_path, '/');

starting with .git/objects/...

> 	len = objects_path->len;
> 	while ((d = readdir(dir))) {
For all contents of dir...
> 		if (d->d_type != DT_DIR)
> 			continue;
..which are directories...
> 		strbuf_setlen(objects_path, len);
> 		strbuf_addstr(objects_path, d->d_name);
> 		subdir = opendir(objects_path->buf);
show all the contents of that subdirectory.
> 		if (!subdir)
> 			continue;
> 		while ((d = readdir(subdir)))
for each regular file there,
> 			if (d->dt_type == DT_REG &&
if it's a regular file,
> 			    is_hex(dir->d_name, the_repository->hash_algo->hexsz))
and the directory is named like an object prefix,
> 				ret++;
increase the total count of numbers of loose objects.
> 		closedir(subdir);
> 	}
> 	closedir(dir);
> 	strbuf_reset(objects_path, len);
> 	return ret;
> }

The foremost difference here is that the loose object count previously
was not given in total - instead, it was divided up by object prefix. I
can't speak to whether that's actually very important to know about, but
the original request from stolee was to have the summary by dirname.
That's certainly still possible with a light modification to this code.

(Suggestion from Stolee, some months ago earlier in this thread:)

  As mentioned before, I've sometimes found it helpful to know the data shape for the object
  store. Having a few extra steps such as the following could be nice:
  
          echo "[Loose Objects]"
          for objdir in $(find "$GIT_DIR/objects/??" -type d)
          do
                  echo "$objdir: $(ls $objdir | wc -l)"
          done
          echo
  ...

Checking that the directory we're about to inspect is hexval rather than
that it's only two characters long is also an interesting point. I'd
probably rather do both, though, since I think we both missed
futureproofing by a little bit.

Dropping the many string_list is fine by me - call it object-oriented
habits dying hard.

I worry somewhat on delving into every directory and only afterwards
checking whether the directory is one we care about, but that's an easy
modification too to your basic suggestion ("don't use a string list for
that, silly").

Thanks. I'll make the changes with the modifications I mentioned and add
your Helped-by line on this commit.

 - Emily

> 
> Ciao,
> Dscho
> 
> > +	}
> > +}
> > diff --git a/bugreport.h b/bugreport.h
> > index 942a5436e3..09ad0c2599 100644
> > --- a/bugreport.h
> > +++ b/bugreport.h
> > @@ -18,3 +18,9 @@ void get_whitelisted_config(struct strbuf *sys_info);
> >   * contents of hook_info will be discarded.
> >   */
> >  void get_populated_hooks(struct strbuf *hook_info);
> > +
> > +/**
> > + * Adds the output of `git count-object -vH`. The previous contents of hook_info
> > + * will be discarded.
> > + */
> > +void get_loose_object_summary(struct strbuf *obj_info);
> > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > index a0eefba498..b2ab194207 100644
> > --- a/builtin/bugreport.c
> > +++ b/builtin/bugreport.c
> > @@ -64,6 +64,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  	get_populated_hooks(&buffer);
> >  	strbuf_write(&buffer, report);
> >
> > +	add_header(report, "Object Counts");
> > +	get_loose_object_summary(&buffer);
> > +	strbuf_write(&buffer, report);
> > +
> >  	fclose(report);
> >
> >  	launch_editor(report_path.buf, NULL, NULL);
> > --
> > 2.24.0.rc0.303.g954a862665-goog
> >
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 7/9] bugreport: add packed object summary
  2019-10-28 15:43       ` Johannes Schindelin
@ 2019-12-11  0:29         ` Emily Shaffer
  2019-12-11 13:37           ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-12-11  0:29 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Oct 28, 2019 at 04:43:50PM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > Alongside the list of loose objects, it's useful to see the list of
> > object packs as well. It can help us to examine what Git did and did not
> > pack.
> 
> Yes, I was write! Packs are next ;-)
> 
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> >  bugreport.c         | 21 +++++++++++++++++++++
> >  bugreport.h         |  6 ++++++
> >  builtin/bugreport.c |  4 ++++
> >  3 files changed, 31 insertions(+)
> >
> > diff --git a/bugreport.c b/bugreport.c
> > index 54e1d47103..79ddb8baaa 100644
> > --- a/bugreport.c
> > +++ b/bugreport.c
> > @@ -219,3 +219,24 @@ void get_loose_object_summary(struct strbuf *obj_info)
> >  			    objects.nr);
> >  	}
> >  }
> > +
> > +void get_packed_object_summary(struct strbuf *obj_info)
> > +{
> > +	struct strbuf dirpath = STRBUF_INIT;
> > +	struct string_list contents = STRING_LIST_INIT_DUP;
> > +	struct string_list_item *entry;
> > +
> > +	strbuf_reset(obj_info);
> > +
> > +	strbuf_addstr(&dirpath, get_object_directory());
> > +	strbuf_complete(&dirpath, '/');
> > +	strbuf_addstr(&dirpath, "pack/");
> > +	list_contents_of_dir(&contents, &dirpath, 0, 0);
> > +
> > +	// list full contents of $GIT_OBJECT_DIRECTORY/pack/
> > +	for_each_string_list_item(entry, &contents) {
> > +		strbuf_addbuf(obj_info, &dirpath);
> > +		strbuf_addstr(obj_info, entry->string);
> > +		strbuf_complete_line(obj_info);
> > +	}
> > +}
> 
> Okay, but I think that you will want to discern between regular `.pack`
> files, `.idx` files and `tmp_*` files.

Discern in what way? How would you like to see them treated separately?
They're all being listed here, not just counted, so it seems to me like
I can read the generated bugreport and see which files are index, pack,
or temporary here.

> 
> > diff --git a/bugreport.h b/bugreport.h
> > index 09ad0c2599..11ff7df41b 100644
> > --- a/bugreport.h
> > +++ b/bugreport.h
> > @@ -24,3 +24,9 @@ void get_populated_hooks(struct strbuf *hook_info);
> >   * will be discarded.
> >   */
> >  void get_loose_object_summary(struct strbuf *obj_info);
> > +
> > +/**
> > + * Adds a list of the contents of '.git/objects/pack'. The previous contents of
> > + * hook_info will be discarded.
> > + */
> > +void get_packed_object_summary(struct strbuf *obj_info);
> > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > index b2ab194207..da91a3944e 100644
> > --- a/builtin/bugreport.c
> > +++ b/builtin/bugreport.c
> > @@ -68,6 +68,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  	get_loose_object_summary(&buffer);
> >  	strbuf_write(&buffer, report);
> >
> > +	add_header(report, "Packed Object Summary");
> > +	get_packed_object_summary(&buffer);
> > +	strbuf_write(&buffer, report);
> > +
> 
> Hmm. At this point, I am unclear whether you want to write into an
> `strbuf`, or directly into a `FILE *`? I would rather have only one, not
> a mix.
> 
> Ciao,
> Dscho
> 
> >  	fclose(report);
> >
> >  	launch_editor(report_path.buf, NULL, NULL);
> > --
> > 2.24.0.rc0.303.g954a862665-goog
> >
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 7/9] bugreport: add packed object summary
  2019-12-11  0:29         ` Emily Shaffer
@ 2019-12-11 13:37           ` Johannes Schindelin
  2019-12-11 20:52             ` Emily Shaffer
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2019-12-11 13:37 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Tue, 10 Dec 2019, Emily Shaffer wrote:

> On Mon, Oct 28, 2019 at 04:43:50PM +0100, Johannes Schindelin wrote:
> > Hi Emily,
> >
> > On Thu, 24 Oct 2019, Emily Shaffer wrote:
> >
> > > Alongside the list of loose objects, it's useful to see the list of
> > > object packs as well. It can help us to examine what Git did and did not
> > > pack.
> >
> > Yes, I was write! Packs are next ;-)
> >
> > >
> > > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > > ---
> > >  bugreport.c         | 21 +++++++++++++++++++++
> > >  bugreport.h         |  6 ++++++
> > >  builtin/bugreport.c |  4 ++++
> > >  3 files changed, 31 insertions(+)
> > >
> > > diff --git a/bugreport.c b/bugreport.c
> > > index 54e1d47103..79ddb8baaa 100644
> > > --- a/bugreport.c
> > > +++ b/bugreport.c
> > > @@ -219,3 +219,24 @@ void get_loose_object_summary(struct strbuf *obj_info)
> > >  			    objects.nr);
> > >  	}
> > >  }
> > > +
> > > +void get_packed_object_summary(struct strbuf *obj_info)
> > > +{
> > > +	struct strbuf dirpath = STRBUF_INIT;
> > > +	struct string_list contents = STRING_LIST_INIT_DUP;
> > > +	struct string_list_item *entry;
> > > +
> > > +	strbuf_reset(obj_info);
> > > +
> > > +	strbuf_addstr(&dirpath, get_object_directory());
> > > +	strbuf_complete(&dirpath, '/');
> > > +	strbuf_addstr(&dirpath, "pack/");
> > > +	list_contents_of_dir(&contents, &dirpath, 0, 0);
> > > +
> > > +	// list full contents of $GIT_OBJECT_DIRECTORY/pack/
> > > +	for_each_string_list_item(entry, &contents) {
> > > +		strbuf_addbuf(obj_info, &dirpath);
> > > +		strbuf_addstr(obj_info, entry->string);
> > > +		strbuf_complete_line(obj_info);
> > > +	}
> > > +}
> >
> > Okay, but I think that you will want to discern between regular `.pack`
> > files, `.idx` files and `tmp_*` files.
>
> Discern in what way? How would you like to see them treated separately?
> They're all being listed here, not just counted, so it seems to me like
> I can read the generated bugreport and see which files are index, pack,
> or temporary here.

I take your word for it (sorry, it's been half an eternity since I wrapped
my head around the diff, I forgotten pretty much all about it ;-))

Ciao,
Dscho

>
> >
> > > diff --git a/bugreport.h b/bugreport.h
> > > index 09ad0c2599..11ff7df41b 100644
> > > --- a/bugreport.h
> > > +++ b/bugreport.h
> > > @@ -24,3 +24,9 @@ void get_populated_hooks(struct strbuf *hook_info);
> > >   * will be discarded.
> > >   */
> > >  void get_loose_object_summary(struct strbuf *obj_info);
> > > +
> > > +/**
> > > + * Adds a list of the contents of '.git/objects/pack'. The previous contents of
> > > + * hook_info will be discarded.
> > > + */
> > > +void get_packed_object_summary(struct strbuf *obj_info);
> > > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > > index b2ab194207..da91a3944e 100644
> > > --- a/builtin/bugreport.c
> > > +++ b/builtin/bugreport.c
> > > @@ -68,6 +68,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> > >  	get_loose_object_summary(&buffer);
> > >  	strbuf_write(&buffer, report);
> > >
> > > +	add_header(report, "Packed Object Summary");
> > > +	get_packed_object_summary(&buffer);
> > > +	strbuf_write(&buffer, report);
> > > +
> >
> > Hmm. At this point, I am unclear whether you want to write into an
> > `strbuf`, or directly into a `FILE *`? I would rather have only one, not
> > a mix.
> >
> > Ciao,
> > Dscho
> >
> > >  	fclose(report);
> > >
> > >  	launch_editor(report_path.buf, NULL, NULL);
> > > --
> > > 2.24.0.rc0.303.g954a862665-goog
> > >
> > >
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-10-28 14:14       ` Johannes Schindelin
@ 2019-12-11 20:48         ` Emily Shaffer
  2019-12-15 17:30           ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-12-11 20:48 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Oct 28, 2019 at 03:14:35PM +0100, Johannes Schindelin wrote:
> Hi Emily,

Sorry for the delay in replying. This work has been backburnered and
this mail slipped through the cracks.

> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > Teach bugreport to gather the values of config options which are present
> > in 'git-bugreport-config-whitelist'.
> >
> > Many config options are sensitive, and many Git add-ons use config
> > options which git-core does not know about; it is better only to gather
> > config options which we know to be safe, rather than excluding options
> > which we know to be unsafe.
> 
> Should we still have the `// bugreport-exclude` comments, then?

They were optional (useless) before too. I can remove them if you want;
I suppose I like the idea of having precedent if someone wants to build
their own internal version with opt-out configs rather than opt-in. I
can remove them if we want; it doesn't matter very much to me either
way.

> 
> >
> > Reading the whitelist into memory and sorting it saves us time -
> > since git_config_bugreport() is called for every option the user has
> > configured, limiting the file IO to one open/read/close and performing
> > option lookup in sublinear time is a useful optimization.
> 
> Maybe we even want a hashmap? That would reduce the time complexity even
> further.

Sure, we can do it. I'll make that change.

> 
> > diff --git a/bugreport.c b/bugreport.c
> > index ada54fe583..afa4836ab1 100644
> > --- a/bugreport.c
> > +++ b/bugreport.c
> > @@ -1,10 +1,24 @@
> >  #include "cache.h"
> >
> >  #include "bugreport.h"
> > +#include "config.h"
> > +#include "exec-cmd.h"
> >  #include "help.h"
> >  #include "run-command.h"
> >  #include "version.h"
> >
> > +/**
> > + * A sorted list of config options which we will add to the bugreport. Managed
> > + * by 'gather_whitelist(...)'.
> > + */
> > +struct string_list whitelist = STRING_LIST_INIT_DUP;
> > +struct strbuf configs_and_values = STRBUF_INIT;
> > +
> > +// git version --build-options
> > +// uname -a
> > +// curl-config --version
> > +// ldd --version
> > +// echo $SHELL
> 
> These comments probably want to move to a single, C style comment, and
> they probably want to be introduced together with `get_system_info()`.

Yeah, it's stale and has been removed now. It was less commentary and
more todo list for author ;)

> 
> I also have to admit that I might have missed where `$SHELL` was added
> to the output...

I skipped it entirely since bugreport doesn't run in shell anymore. If
you have advice for gathering the user's shell I can try to add it; is
there such a difference between, say, a Debian user using bash and a
Debian user using zsh? I suppose it could be useful if someone has an
issue with GIT_PS1, or with autocompletion. I'll look into gathering it.

> 
> >  void get_system_info(struct strbuf *sys_info)
> >  {
> >  	struct child_process cp = CHILD_PROCESS_INIT;
> > @@ -53,3 +67,39 @@ void get_system_info(struct strbuf *sys_info)
> >  	argv_array_clear(&cp.args);
> >  	strbuf_reset(&std_out);
> >  }
> > +
> > +void gather_whitelist(struct strbuf *path)
> > +{
> > +	struct strbuf tmp = STRBUF_INIT;
> > +	strbuf_read_file(&tmp, path->buf, 0);
> > +	string_list_init(&whitelist, 1);
> > +	string_list_split(&whitelist, tmp.buf, '\n', -1);
> > +	string_list_sort(&whitelist);
> > +}
> > +
> > +int git_config_bugreport(const char *var, const char *value, void *cb)
> > +{
> > +	if (string_list_has_string(&whitelist, var)) {
> > +		strbuf_addf(&configs_and_values,
> > +			    "%s : %s\n",
> > +			    var, value);
> 
> A quite useful piece of information would be the config source. Not sure
> whether we can do that outside of `config.c` yet...

It's possible. I can add it.

> 
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +void get_whitelisted_config(struct strbuf *config_info)
> > +{
> > +	struct strbuf path = STRBUF_INIT;
> > +
> > +	strbuf_addstr(&path, git_exec_path());
> > +	strbuf_addstr(&path, "/git-bugreport-config-whitelist");
> 
> Hmm. I would have expected this patch to come directly after the patch
> 2/9 that generates that white-list, and I would also have expected that
> to be pre-sorted, and compiled in.
> 
> Do you want users to _edit_ the file in the exec path? In general, that
> path will be write-protected, though. A better alternative would
> probably be to compile in a hard-coded list, and to allow including more
> values e.g. by offering command-line options to specify config setting
> patterns. But if we allow patterns, we might actually want to have those
> exclusions to prevent sensitive data from being included.

Hm, interesting. Do we have precedent for compiling in a header
generated during the build process? I think I saw one when I was adding
this script - I'll take a look.

> 
> > +
> > +	gather_whitelist(&path);
> > +	strbuf_init(&configs_and_values, whitelist.nr);
> > +
> > +	git_config(git_config_bugreport, NULL);
> > +
> > +	strbuf_reset(config_info);
> > +	strbuf_addbuf(config_info, &configs_and_values);
> > +}
> > diff --git a/bugreport.h b/bugreport.h
> > index ba216acf3f..7413e7e1be 100644
> > --- a/bugreport.h
> > +++ b/bugreport.h
> > @@ -5,3 +5,10 @@
> >   * The previous contents of sys_info will be discarded.
> >   */
> >  void get_system_info(struct strbuf *sys_info);
> > +
> > +/**
> 
> I also frequently use JavaDoc-style `/**`, but I am not sure that this
> is actually desired in Git's source code ;-)
> 
> > + * Adds the values of the config items listed in
> > + * 'git-bugreport-config-whitelist' to config_info. The previous contents of
> > + * config_info will be discarded.
> > + */
> > +void get_whitelisted_config(struct strbuf *sys_info);
> > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > index 7232d31be7..70fe0d2b85 100644
> > --- a/builtin/bugreport.c
> > +++ b/builtin/bugreport.c
> > @@ -56,6 +56,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  	get_system_info(&buffer);
> >  	strbuf_write(&buffer, report);
> >
> > +	add_header(report, "Whitelisted Config");
> 
> Quite honestly, I would like to avoid the term "whitelist" for good. How
> about "Selected config settings" instead?

Will do - thanks for the callout.

> 
> Thanks,
> Dscho
> 
> > +	get_whitelisted_config(&buffer);
> > +	strbuf_write(&buffer, report);
> > +
> >  	fclose(report);
> >
> >  	launch_editor(report_path.buf, NULL, NULL);
> > --
> > 2.24.0.rc0.303.g954a862665-goog
> >
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 5/9] bugreport: collect list of populated hooks
  2019-10-28 14:31       ` Johannes Schindelin
@ 2019-12-11 20:51         ` Emily Shaffer
  2019-12-15 17:40           ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Emily Shaffer @ 2019-12-11 20:51 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Oct 28, 2019 at 03:31:36PM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Thu, 24 Oct 2019, Emily Shaffer wrote:
> 
> > Occasionally a failure a user is seeing may be related to a specific
> > hook which is being run, perhaps without the user realizing. While the
> > contents of hooks can be sensitive - containing user data or process
> > information specific to the user's organization - simply knowing that a
> > hook is being run at a certain stage can help us to understand whether
> > something is going wrong.
> >
> > Without a definitive list of hook names within the code, we compile our
> > own list from the documentation. This is likely prone to bitrot. To
> > reduce the amount of code humans need to read, we turn the list into a
> > string_list and iterate over it (as we are calling the same find_hook
> > operation on each string). However, since bugreport should primarily be
> > called by the user, the performance loss from massaging the string
> > seems acceptable.
> 
> Good idea!
> 
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> >  bugreport.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  bugreport.h         |  6 ++++++
> >  builtin/bugreport.c |  4 ++++
> >  3 files changed, 54 insertions(+)
> >
> > diff --git a/bugreport.c b/bugreport.c
> > index afa4836ab1..9d7f44ff28 100644
> > --- a/bugreport.c
> > +++ b/bugreport.c
> > @@ -103,3 +103,47 @@ void get_whitelisted_config(struct strbuf *config_info)
> >  	strbuf_reset(config_info);
> >  	strbuf_addbuf(config_info, &configs_and_values);
> >  }
> > +
> > +void get_populated_hooks(struct strbuf *hook_info)
> > +{
> > +	/*
> > +	 * Doesn't look like there is a list of all possible hooks; so below is
> > +	 * a transcription of `git help hook`.
> > +	 */
> > +	const char *hooks = "applypatch-msg,"
> > +			    "pre-applypatch,"
> > +			    "post-applypatch,"
> > +			    "pre-commit,"
> > +			    "pre-merge-commit,"
> > +			    "prepare-commit-msg,"
> > +			    "commit-msg,"
> > +			    "post-commit,"
> > +			    "pre-rebase,"
> > +			    "post-checkout,"
> > +			    "post-merge,"
> > +			    "pre-push,"
> > +			    "pre-receive,"
> > +			    "update,"
> > +			    "post-receive,"
> > +			    "post-update,"
> > +			    "push-to-checkout,"
> > +			    "pre-auto-gc,"
> > +			    "post-rewrite,"
> > +			    "sendemail-validate,"
> > +			    "fsmonitor-watchman,"
> > +			    "p4-pre-submit,"
> > +			    "post-index-changex";
> 
> Well, this is disappointing: I tried to come up with a scripted way to
> extract the commit names from our source code, and I failed! This is
> where I gave up:
> 
> 	git grep run_hook |
> 	sed -n 's/.*run_hook[^(]*([^,]*, *\([^,]*\).*/\1/p' |
> 	sed -e '/^name$/d' -e '/^const char \*name$/d' \
> 		-e 's/push_to_checkout_hook/"push-to-checkout"/' |
> 	sort |
> 	uniq
> 
> This prints only the following hook names:
> 
> 	"applypatch-msg"
> 	"post-applypatch"
> 	"post-checkout"
> 	"post-index-change"
> 	"post-merge"
> 	"pre-applypatch"
> 	"pre-auto-gc"
> 	"pre-rebase"
> 	"prepare-commit-msg"
> 	"push-to-checkout"
> 
> But at least it was easy to script the extracting from the
> Documentation:
> 
> 	sed -n '/^[a-z]/{N;/\n~~~/{s/\n.*//;p}}' \
> 		Documentation/githooks.txt
> 

To be totally frank, I'm not keen on spending a lot of time with
scripting this. My parallel work with config-based hooks will also
involve an in-code source of truth for available hooknames; I'd like to
punt on some scripting, putting it in the makefile, etc blah since I
know I'm planning to fix this particular annoyance at the source - and
since it looks like bugreport will stay in the review phase for at least
a little longer.

> > +	struct string_list hooks_list = STRING_LIST_INIT_DUP;
> > +	struct string_list_item *iter = NULL;
> > +
> > +	strbuf_reset(hook_info);
> > +
> > +	string_list_split(&hooks_list, hooks, ',', -1);
> > +
> > +	for_each_string_list_item(iter, &hooks_list) {
> 
> This should definitely be done at compile time, I think. We should be
> able to generate an appropriate header via something like this:
> 
> 	cat >hook-names.h <<-EOF
> 	static const char *hook_names[] = {
> 	$(sed -n '/^[a-z]/{N;/\n~~~/{s/\n.*/",/;s/^/	"/;p}}' \
> 		Documentation/githooks.txt)
> 	};
> 	EOF
> 
> Then you would use a simple `for()` loop using `ARRAY_SIZE()` to
> iterate over the hook names.
> 
> > +		if (find_hook(iter->string)) {
> > +			strbuf_addstr(hook_info, iter->string);
> > +			strbuf_complete_line(hook_info);
> > +		}
> > +	}
> > +}
> > diff --git a/bugreport.h b/bugreport.h
> > index 7413e7e1be..942a5436e3 100644
> > --- a/bugreport.h
> > +++ b/bugreport.h
> > @@ -12,3 +12,9 @@ void get_system_info(struct strbuf *sys_info);
> >   * config_info will be discarded.
> >   */
> >  void get_whitelisted_config(struct strbuf *sys_info);
> > +
> > +/**
> > + * Adds the paths to all configured hooks (but not their contents). The previous
> > + * contents of hook_info will be discarded.
> > + */
> > +void get_populated_hooks(struct strbuf *hook_info);
> > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > index 70fe0d2b85..a0eefba498 100644
> > --- a/builtin/bugreport.c
> > +++ b/builtin/bugreport.c
> > @@ -60,6 +60,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> >  	get_whitelisted_config(&buffer);
> >  	strbuf_write(&buffer, report);
> >
> > +	add_header(report, "Populated Hooks");
> 
> Wait! I should have stumbled over this in an earlier patch. The
> `add_header()` function should not take a `FILE *` parameter at all, but
> instead an `struct strbuf *` one!
> 
> Ciao,
> Dscho
> 
> > +	get_populated_hooks(&buffer);
> > +	strbuf_write(&buffer, report);
> > +
> >  	fclose(report);
> >
> >  	launch_editor(report_path.buf, NULL, NULL);
> > --
> > 2.24.0.rc0.303.g954a862665-goog
> >
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 7/9] bugreport: add packed object summary
  2019-12-11 13:37           ` Johannes Schindelin
@ 2019-12-11 20:52             ` Emily Shaffer
  0 siblings, 0 replies; 68+ messages in thread
From: Emily Shaffer @ 2019-12-11 20:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Wed, Dec 11, 2019 at 02:37:10PM +0100, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Tue, 10 Dec 2019, Emily Shaffer wrote:
> 
> > On Mon, Oct 28, 2019 at 04:43:50PM +0100, Johannes Schindelin wrote:
> > > Hi Emily,
> > >
> > > On Thu, 24 Oct 2019, Emily Shaffer wrote:
> > >
> > > > Alongside the list of loose objects, it's useful to see the list of
> > > > object packs as well. It can help us to examine what Git did and did not
> > > > pack.
> > >
> > > Yes, I was write! Packs are next ;-)
> > >
> > > >
> > > > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > > > ---
> > > >  bugreport.c         | 21 +++++++++++++++++++++
> > > >  bugreport.h         |  6 ++++++
> > > >  builtin/bugreport.c |  4 ++++
> > > >  3 files changed, 31 insertions(+)
> > > >
> > > > diff --git a/bugreport.c b/bugreport.c
> > > > index 54e1d47103..79ddb8baaa 100644
> > > > --- a/bugreport.c
> > > > +++ b/bugreport.c
> > > > @@ -219,3 +219,24 @@ void get_loose_object_summary(struct strbuf *obj_info)
> > > >  			    objects.nr);
> > > >  	}
> > > >  }
> > > > +
> > > > +void get_packed_object_summary(struct strbuf *obj_info)
> > > > +{
> > > > +	struct strbuf dirpath = STRBUF_INIT;
> > > > +	struct string_list contents = STRING_LIST_INIT_DUP;
> > > > +	struct string_list_item *entry;
> > > > +
> > > > +	strbuf_reset(obj_info);
> > > > +
> > > > +	strbuf_addstr(&dirpath, get_object_directory());
> > > > +	strbuf_complete(&dirpath, '/');
> > > > +	strbuf_addstr(&dirpath, "pack/");
> > > > +	list_contents_of_dir(&contents, &dirpath, 0, 0);
> > > > +
> > > > +	// list full contents of $GIT_OBJECT_DIRECTORY/pack/
> > > > +	for_each_string_list_item(entry, &contents) {
> > > > +		strbuf_addbuf(obj_info, &dirpath);
> > > > +		strbuf_addstr(obj_info, entry->string);
> > > > +		strbuf_complete_line(obj_info);
> > > > +	}
> > > > +}
> > >
> > > Okay, but I think that you will want to discern between regular `.pack`
> > > files, `.idx` files and `tmp_*` files.
> >
> > Discern in what way? How would you like to see them treated separately?
> > They're all being listed here, not just counted, so it seems to me like
> > I can read the generated bugreport and see which files are index, pack,
> > or temporary here.
> 
> I take your word for it (sorry, it's been half an eternity since I wrapped
> my head around the diff, I forgotten pretty much all about it ;-))

That makes two of us. :)

> 
> Ciao,
> Dscho
> 
> >
> > >
> > > > diff --git a/bugreport.h b/bugreport.h
> > > > index 09ad0c2599..11ff7df41b 100644
> > > > --- a/bugreport.h
> > > > +++ b/bugreport.h
> > > > @@ -24,3 +24,9 @@ void get_populated_hooks(struct strbuf *hook_info);
> > > >   * will be discarded.
> > > >   */
> > > >  void get_loose_object_summary(struct strbuf *obj_info);
> > > > +
> > > > +/**
> > > > + * Adds a list of the contents of '.git/objects/pack'. The previous contents of
> > > > + * hook_info will be discarded.
> > > > + */
> > > > +void get_packed_object_summary(struct strbuf *obj_info);
> > > > diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> > > > index b2ab194207..da91a3944e 100644
> > > > --- a/builtin/bugreport.c
> > > > +++ b/builtin/bugreport.c
> > > > @@ -68,6 +68,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
> > > >  	get_loose_object_summary(&buffer);
> > > >  	strbuf_write(&buffer, report);
> > > >
> > > > +	add_header(report, "Packed Object Summary");
> > > > +	get_packed_object_summary(&buffer);
> > > > +	strbuf_write(&buffer, report);
> > > > +
> > >
> > > Hmm. At this point, I am unclear whether you want to write into an
> > > `strbuf`, or directly into a `FILE *`? I would rather have only one, not
> > > a mix.
> > >
> > > Ciao,
> > > Dscho
> > >
> > > >  	fclose(report);
> > > >
> > > >  	launch_editor(report_path.buf, NULL, NULL);
> > > > --
> > > > 2.24.0.rc0.303.g954a862665-goog
> > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 4/9] bugreport: add config values from whitelist
  2019-12-11 20:48         ` Emily Shaffer
@ 2019-12-15 17:30           ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-12-15 17:30 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Wed, 11 Dec 2019, Emily Shaffer wrote:

> On Mon, Oct 28, 2019 at 03:14:35PM +0100, Johannes Schindelin wrote:
>
> >
> > On Thu, 24 Oct 2019, Emily Shaffer wrote:
> >
> > > Teach bugreport to gather the values of config options which are
> > > present in 'git-bugreport-config-whitelist'.
> > >
> > > Many config options are sensitive, and many Git add-ons use config
> > > options which git-core does not know about; it is better only to
> > > gather config options which we know to be safe, rather than
> > > excluding options which we know to be unsafe.
> >
> > Should we still have the `// bugreport-exclude` comments, then?
>
> They were optional (useless) before too. I can remove them if you want;
> I suppose I like the idea of having precedent if someone wants to build
> their own internal version with opt-out configs rather than opt-in. I
> can remove them if we want; it doesn't matter very much to me either
> way.

How are you guaranteeing that this information does not become stale,
like, immediately?

If you _still_ insist on keeping those comments even after answering this
question, at least please turn them into C-style comments: we have no
C++-style comments in Git and want to keep it this way.

> > > diff --git a/bugreport.c b/bugreport.c
> > > [...]
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +void get_whitelisted_config(struct strbuf *config_info)
> > > +{
> > > +	struct strbuf path = STRBUF_INIT;
> > > +
> > > +	strbuf_addstr(&path, git_exec_path());
> > > +	strbuf_addstr(&path, "/git-bugreport-config-whitelist");
> >
> > Hmm. I would have expected this patch to come directly after the patch
> > 2/9 that generates that white-list, and I would also have expected that
> > to be pre-sorted, and compiled in.
> >
> > Do you want users to _edit_ the file in the exec path? In general, that
> > path will be write-protected, though. A better alternative would
> > probably be to compile in a hard-coded list, and to allow including more
> > values e.g. by offering command-line options to specify config setting
> > patterns. But if we allow patterns, we might actually want to have those
> > exclusions to prevent sensitive data from being included.
>
> Hm, interesting. Do we have precedent for compiling in a header
> generated during the build process? I think I saw one when I was adding
> this script - I'll take a look.

Yes, we do. The entire `command-list.h` is generated during the build.
Look for `GENERATED_H` in the `Makefile`.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 5/9] bugreport: collect list of populated hooks
  2019-12-11 20:51         ` Emily Shaffer
@ 2019-12-15 17:40           ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2019-12-15 17:40 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Wed, 11 Dec 2019, Emily Shaffer wrote:

> On Mon, Oct 28, 2019 at 03:31:36PM +0100, Johannes Schindelin wrote:
>
> > On Thu, 24 Oct 2019, Emily Shaffer wrote:
> >
> > > diff --git a/bugreport.c b/bugreport.c
> > > index afa4836ab1..9d7f44ff28 100644
> > > --- a/bugreport.c
> > > +++ b/bugreport.c
> > > @@ -103,3 +103,47 @@ void get_whitelisted_config(struct strbuf *config_info)
> > >  	strbuf_reset(config_info);
> > >  	strbuf_addbuf(config_info, &configs_and_values);
> > >  }
> > > +
> > > +void get_populated_hooks(struct strbuf *hook_info)
> > > +{
> > > +	/*
> > > +	 * Doesn't look like there is a list of all possible hooks; so below is
> > > +	 * a transcription of `git help hook`.
> > > +	 */
> > > +	const char *hooks = "applypatch-msg,"
> > > +			    "pre-applypatch,"
> > > +			    "post-applypatch,"
> > > +			    "pre-commit,"
> > > +			    "pre-merge-commit,"
> > > +			    "prepare-commit-msg,"
> > > +			    "commit-msg,"
> > > +			    "post-commit,"
> > > +			    "pre-rebase,"
> > > +			    "post-checkout,"
> > > +			    "post-merge,"
> > > +			    "pre-push,"
> > > +			    "pre-receive,"
> > > +			    "update,"
> > > +			    "post-receive,"
> > > +			    "post-update,"
> > > +			    "push-to-checkout,"
> > > +			    "pre-auto-gc,"
> > > +			    "post-rewrite,"
> > > +			    "sendemail-validate,"
> > > +			    "fsmonitor-watchman,"
> > > +			    "p4-pre-submit,"
> > > +			    "post-index-changex";
> >
> > Well, this is disappointing: I tried to come up with a scripted way to
> > extract the commit names from our source code, and I failed! This is
> > where I gave up:
> >
> > 	git grep run_hook |
> > 	sed -n 's/.*run_hook[^(]*([^,]*, *\([^,]*\).*/\1/p' |
> > 	sed -e '/^name$/d' -e '/^const char \*name$/d' \
> > 		-e 's/push_to_checkout_hook/"push-to-checkout"/' |
> > 	sort |
> > 	uniq
> >
> > This prints only the following hook names:
> >
> > 	"applypatch-msg"
> > 	"post-applypatch"
> > 	"post-checkout"
> > 	"post-index-change"
> > 	"post-merge"
> > 	"pre-applypatch"
> > 	"pre-auto-gc"
> > 	"pre-rebase"
> > 	"prepare-commit-msg"
> > 	"push-to-checkout"
> >
> > But at least it was easy to script the extracting from the
> > Documentation:
> >
> > 	sed -n '/^[a-z]/{N;/\n~~~/{s/\n.*//;p}}' \
> > 		Documentation/githooks.txt
> >
>
> To be totally frank, I'm not keen on spending a lot of time with
> scripting this. My parallel work with config-based hooks will also
> involve an in-code source of truth for available hooknames; I'd like to
> punt on some scripting, putting it in the makefile, etc blah since I
> know I'm planning to fix this particular annoyance at the source - and
> since it looks like bugreport will stay in the review phase for at least
> a little longer.

Fair enough, especially if it addresses my complaint about the
scriptability.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2019-12-15 17:40 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15  2:34 [PATCH] bugreport: add tool to generate debugging info Emily Shaffer
2019-08-15 14:15 ` Derrick Stolee
2019-08-15 14:36   ` Junio C Hamano
2019-08-15 22:52     ` Emily Shaffer
2019-08-15 23:40       ` Junio C Hamano
2019-08-16  1:25         ` Emily Shaffer
2019-08-16 16:41           ` Junio C Hamano
2019-08-16 19:08             ` Emily Shaffer
2019-08-15 20:07   ` Johannes Schindelin
2019-08-15 22:24     ` Emily Shaffer
2019-08-16 20:19       ` Johannes Schindelin
2019-08-15 20:13   ` Emily Shaffer
2019-08-15 18:10 ` Junio C Hamano
2019-08-15 21:52   ` Emily Shaffer
2019-08-15 22:29     ` Junio C Hamano
2019-08-15 22:54       ` Emily Shaffer
2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
2019-08-17  0:39   ` [PATCH v2 1/2] bugreport: add tool to generate debugging info Emily Shaffer
2019-08-17  0:39   ` [PATCH v2 2/2] bugreport: generate config whitelist based on docs Emily Shaffer
2019-08-17 20:38     ` Martin Ågren
2019-08-21 17:40       ` Emily Shaffer
2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
2019-10-29 20:29       ` Josh Steadmon
2019-11-16  3:11       ` Junio C Hamano
2019-11-19 20:25         ` Emily Shaffer
2019-11-19 23:24           ` Johannes Schindelin
2019-11-20  0:37             ` Junio C Hamano
2019-11-20 10:51               ` Johannes Schindelin
2019-11-19 23:31           ` Johannes Schindelin
2019-11-20  0:39             ` Junio C Hamano
2019-11-20  2:09             ` Emily Shaffer
2019-11-20  0:32           ` Junio C Hamano
2019-10-25  2:51     ` [PATCH v3 2/9] bugreport: generate config whitelist based on docs Emily Shaffer
2019-10-28 13:27       ` Johannes Schindelin
2019-10-25  2:51     ` [PATCH v3 3/9] bugreport: add version and system information Emily Shaffer
2019-10-28 13:49       ` Johannes Schindelin
2019-11-08 21:48         ` Emily Shaffer
2019-11-11 13:48           ` Johannes Schindelin
2019-11-14 21:42             ` Emily Shaffer
2019-10-29 20:43       ` Josh Steadmon
2019-10-25  2:51     ` [PATCH v3 4/9] bugreport: add config values from whitelist Emily Shaffer
2019-10-28 14:14       ` Johannes Schindelin
2019-12-11 20:48         ` Emily Shaffer
2019-12-15 17:30           ` Johannes Schindelin
2019-10-29 20:58       ` Josh Steadmon
2019-10-30  1:37         ` Junio C Hamano
2019-11-14 21:55           ` Emily Shaffer
2019-10-25  2:51     ` [PATCH v3 5/9] bugreport: collect list of populated hooks Emily Shaffer
2019-10-28 14:31       ` Johannes Schindelin
2019-12-11 20:51         ` Emily Shaffer
2019-12-15 17:40           ` Johannes Schindelin
2019-10-25  2:51     ` [PATCH v3 6/9] bugreport: count loose objects Emily Shaffer
2019-10-28 15:07       ` Johannes Schindelin
2019-12-10 22:34         ` Emily Shaffer
2019-10-29 21:18       ` Josh Steadmon
2019-10-25  2:51     ` [PATCH v3 7/9] bugreport: add packed object summary Emily Shaffer
2019-10-28 15:43       ` Johannes Schindelin
2019-12-11  0:29         ` Emily Shaffer
2019-12-11 13:37           ` Johannes Schindelin
2019-12-11 20:52             ` Emily Shaffer
2019-10-25  2:51     ` [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info Emily Shaffer
2019-10-28 15:51       ` Johannes Schindelin
2019-10-25  2:51     ` [PATCH v3 9/9] bugreport: print contents of alternates file Emily Shaffer
2019-10-28 15:57       ` Johannes Schindelin
2019-11-19 20:40         ` Emily Shaffer
2019-10-29  1:54     ` [PATCH v3 0/9] add git-bugreport tool Junio C Hamano
2019-10-29 11:13       ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).