All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Ivan Frade via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Ivan Frade <ifrade@google.com>
Subject: Re: [PATCH v3] fetch-pack: redact packfile urls in traces
Date: Wed, 20 Oct 2021 13:41:57 +0200	[thread overview]
Message-ID: <211020.868rynoqfv.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <pull.1052.v3.git.1634684260142.gitgitgadget@gmail.com>


On Tue, Oct 19 2021, Ivan Frade via GitGitGadget wrote:

> From: Ivan Frade <ifrade@google.com>
>
> In some setups, packfile uris act as bearer token. It is not
> recommended to expose them plainly in logs, although in special
> circunstances (e.g. debug) it makes sense to write them.
>
> Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT
> variable is set to false. This mimics the redacting of the Authorization
> header in HTTP.
>
> Signed-off-by: Ivan Frade <ifrade@google.com>
> ---
>     fetch-pack: redact packfile urls in traces
>     
>     Changes since v1:

Just context for other reviewers:

s/Changes since v1/Changes since v2/ I see, from the context of
https://lore.kernel.org/git/pull.1052.v2.git.1633746024175.gitgitgadget@gmail.com/

>      * Redact only the path of the URL
>      * Test are now strict, validating the exact line expected in the log

And this was changed in v2...

>     Changes since v1:
>     
>      * Removed non-POSIX flags in tests
>      * More accurate regex for the non-encrypted packfile line

[...]

>      * Dropped documentation change
>      * Dropped redacting the die message in http-fetch

Since both of those were done I think in response to my feedback I just
want to clarify (if needed):

 * That the documentation change is still good to have, although I had
   feedback on fixing that more generally in the protocol v2 docs. It
   would be great if you could still pursue it (and that I didn't
   discourage you from doing so).

 * I think having this redaction in die() could still be valuable,
   e.g. your packfile-uri's start failing, and now users are
   copy/pasting "private" URLs that contain their passwords or whatever
   to try to get help, that would be bad.   

   But perhaps if you don't have private URLs redacting them
   unconditionally would slow down debugging for some, i.e. you have 10x
   pasted URLs, and all the errors are from one set of servers (although
   your current redaction includes the hostname, which I think would
   address most cases of say one CDN node failing).

   It was really just a comment that your v1's commit message didn't
   mention or justify it, but just having it make a mention of it would
   also be an OK solution, or fold that into another patch or
   whatever...

>  {
> +	int saved_options;
>  	process_section_header(reader, "packfile-uris", 0);
> +	/*
> +	 * In some setups, packfile-uris act as bearer tokens,
> +	 * redact them by default.
> +	 */
> +	saved_options = reader->options;

nit: no need to pre-declare "int saved_options" here, just move this &
the comment above "process_section_header" (in this case I'd say a
comment isn't even needed, obvious from context...), or it should be at
the definition of PACKET_READ_REDACT_URL_PATH... (more below)

> +	if (git_env_bool("GIT_TRACE_REDACT", 1))

If we're going to use GIT_TRACE_REDACT for this the documentation needs updating:

Documentation/git.txt:`GIT_TRACE_REDACT`::
Documentation/git.txt-  By default, when tracing is activated, Git redacts the values of
Documentation/git.txt-  cookies, the "Authorization:" header, and the "Proxy-Authorization:"
Documentation/git.txt-  header. Set this variable to `0` to prevent this redaction.

> +		reader->options |= PACKET_READ_REDACT_URL_PATH;

(continued)... but that was from a really narrow reading of the code, I
think this whole flip-flopping of options back and forth isn't needed at
all, and you should just assign this flag at the top of
do_fetch_pack_v2(), no?  The setting of it also looks like it belongs
with the reading of "GIT_TEST_SIDEBAND_ALL".

I.e. nothing else uses PACKET_READ_REDACT_URL_PATH, why do we need to be
flipping it back & forth? Keeping these flags in the "reader" is what
that member is for, isn't it? Maybe I'm missing something.

> +static int find_url_path_start(const char* buffer)
> +{
> +	const char *URL_MARK = "://";
> +	char *p = strstr(buffer, URL_MARK);
> +	if (!p) {
> +		return -1;
> +	}
> +
> +	p += strlen(URL_MARK);
> +	while (*p && *p != '/')
> +		p++;
> +
> +	// Position after '/'
> +	if (*p && *(p + 1))
> +		return (p + 1) - buffer;
> +
> +	return -1;
> +}

I think that packfile URI only supports http:// and https://, not
file:// or whatever, so I wonder if either curl or we have a helper
function for this that we can use....(more below)

>  enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  						size_t *src_len, char *buffer,
>  						unsigned size, int *pktlen,
> @@ -393,6 +412,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  {
>  	int len;
>  	char linelen[4];
> +	int url_path_start;
>  
>  	if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) {
>  		*pktlen = -1;
> @@ -443,7 +463,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  		len--;
>  
>  	buffer[len] = 0;
> -	packet_trace(buffer, len, 0);
> +	if (options & PACKET_READ_REDACT_URL_PATH &&
> +	    (url_path_start = find_url_path_start(buffer)) != -1) {
> +		const char *redacted = "<redacted>";
> +		struct strbuf tracebuf = STRBUF_INIT;
> +		strbuf_insert(&tracebuf, 0, buffer, len);
> +		strbuf_splice(&tracebuf, url_path_start,
> +			      len - url_path_start, redacted, strlen(redacted));
> +		packet_trace(tracebuf.buf, tracebuf.len, 0);
> +		strbuf_release(&tracebuf);
> +	} else {
> +		packet_trace(buffer, len, 0);
> +	}

...If we're redacting the URL isn't (and this might be less code with a
helper function) saying:

    failed to get 'https' url from 'somehost.example.com' (full URL redacted due to XYZ setting)

Friendlier than something like (which this function sets up):

    failed to get 'https://somehost.example.com/<redacted>' url

I.e. it allows us to use a get_schema_from_url() and get_host_from_url()
functions (I don't know if/where we have those, but seems likelier than
"find path url boundary" (or maybe I'm wrong and we always feed those
as-is to curl et al).

  reply	other threads:[~2021-10-20 12:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget
2021-10-08 19:36   ` Ævar Arnfjörð Bjarmason
2021-10-08 23:15     ` Ivan Frade
2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget
2021-10-08 19:43   ` Ævar Arnfjörð Bjarmason
2021-10-09  2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-11 20:39   ` Junio C Hamano
2021-10-26 19:32     ` Ivan Frade
2021-10-19 22:57   ` [PATCH v3] " Ivan Frade via GitGitGadget
2021-10-20 11:41     ` Ævar Arnfjörð Bjarmason [this message]
2021-10-26 22:49     ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget
2021-10-26 22:49       ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget
2021-10-28  1:01         ` Junio C Hamano
2021-10-28 22:15           ` Ivan Frade
2021-10-28 22:46             ` Junio C Hamano
2021-10-26 22:49       ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-10-28 16:39         ` Ævar Arnfjörð Bjarmason
2021-10-28 17:25           ` Eric Sunshine
2021-10-28 22:44             ` Ivan Frade
2021-10-28 22:41           ` Ivan Frade
2021-10-29 23:18           ` Junio C Hamano
2021-11-09  1:54             ` Ævar Arnfjörð Bjarmason
2021-10-28 22:51       ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-28 22:51         ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget
2021-10-28 23:21           ` Junio C Hamano
2021-10-29 18:42             ` Ivan Frade
2021-10-29 19:59               ` Junio C Hamano
2021-11-08 22:43                 ` Jonathan Tan
2021-10-28 22:51         ` [PATCH v5 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-10-29 18:42         ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-29 18:42           ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget
2021-11-08 23:01             ` Jonathan Tan
2021-11-09  1:36               ` Ævar Arnfjörð Bjarmason
2021-11-10 23:44                 ` Ivan Frade
2021-11-11  0:01                   ` Ævar Arnfjörð Bjarmason
2021-11-10 21:18               ` Ivan Frade
2021-10-29 18:42           ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-11-08 23:06             ` Jonathan Tan
2021-11-10 23:51           ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-11-10 23:51             ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget
2021-11-10 23:51             ` [PATCH v7 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-11-12  4:43             ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211020.868rynoqfv.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=ifrade@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.