* [PATCH 0/2] fetch-pack: redact packfile urls in traces @ 2021-10-08 16:03 Ivan Frade via GitGitGadget 2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-08 16:03 UTC (permalink / raw) To: git; +Cc: Ivan Frade In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade ifrade@google.com Ivan Frade (2): fetch-pack: redact packfile urls in traces Documentation: packfile-uri hash can be longer than 40 hex chars Documentation/technical/protocol-v2.txt | 8 ++--- fetch-pack.c | 11 +++++++ http-fetch.c | 4 ++- pkt-line.c | 7 +++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 43 +++++++++++++++++++++++++ 6 files changed, 68 insertions(+), 6 deletions(-) base-commit: 0785eb769886ae81e346df10e88bc49ffc0ac64e Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 -- gitgitgadget ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/2] fetch-pack: redact packfile urls in traces 2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget @ 2021-10-08 16:03 ` Ivan Frade via GitGitGadget 2021-10-08 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget 2021-10-09 2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-08 16:03 UTC (permalink / raw) To: git; +Cc: Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- fetch-pack.c | 11 +++++++++++ http-fetch.c | 4 +++- pkt-line.c | 7 ++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 43 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 64 insertions(+), 2 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..05c85eeafa1 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1518,7 +1518,16 @@ static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { + int original_options; process_section_header(reader, "packfile-uris", 0); + /* + * In some setups, packfile-uris act as bearer tokens, + * redact them by default. + */ + original_options = reader->options; + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader->options |= PACKET_READ_REDACT_ON_TRACE; + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || reader->line[the_hash_algo->hexsz] != ' ') @@ -1526,6 +1535,8 @@ static void receive_packfile_uris(struct packet_reader *reader, string_list_append(uris, reader->line); } + reader->options = original_options; + if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } diff --git a/http-fetch.c b/http-fetch.c index fa642462a9e..d35e33e4f65 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash, if (start_active_slot(preq->slot)) { run_active_slot(preq->slot); if (results.curl_result != CURLE_OK) { - die("Unable to get pack file %s\n%s", preq->url, + int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); + die("Unable to get offloaded pack file %s\n%s", + showUrl ? preq->url : "<redacted>", curl_errorstr); } } else { diff --git a/pkt-line.c b/pkt-line.c index de4a94b437e..8da8ed88ccf 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -443,7 +443,12 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_ON_TRACE) { + const char *redacted = "<redacted>"; + packet_trace(redacted, strlen(redacted), 0); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 82b95e4bdd3..44c02f3bc6e 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -88,6 +88,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_ON_TRACE (1u<<4) int packet_read(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int options); diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..a620a678a56 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,49 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" +' + +test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -A1 "clone<\ ..packfile-uris" log | grep -E "clone<\ ..[[:alnum:]]{40,64}\ http" +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 1/2] fetch-pack: redact packfile urls in traces 2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget @ 2021-10-08 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-08 23:15 ` Ivan Frade 0 siblings, 1 reply; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-08 19:36 UTC (permalink / raw) To: Ivan Frade via GitGitGadget; +Cc: git, Ivan Frade On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote: > diff --git a/http-fetch.c b/http-fetch.c > index fa642462a9e..d35e33e4f65 100644 > --- a/http-fetch.c > +++ b/http-fetch.c > @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash, > if (start_active_slot(preq->slot)) { > run_active_slot(preq->slot); > if (results.curl_result != CURLE_OK) { > - die("Unable to get pack file %s\n%s", preq->url, > + int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); > + die("Unable to get offloaded pack file %s\n%s", > + showUrl ? preq->url : "<redacted>", > curl_errorstr); > } > } else { Your CL and commit message just talk about traes, but this is a die() message. Perhaps it makes sense to redact it there too for some reason, but that seems to be a thing to separately argue for. This message is shown interactively to users, and I could see it be annoying to not have the URL that failed in your terminal output, even if it has some one-time token. Which is presumably different from the use-cases you're thinking of, I'm assuming some logging of detached processes, or central logging of user actions. > +test_expect_success 'packfile-uri redacted in trace' ' > + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && > + rm -rf "$P" http_child log && > + > + git init "$P" && > + git -C "$P" config "uploadpack.allowsidebandall" "true" && > + > + echo my-blob >"$P/my-blob" && > + git -C "$P" add my-blob && > + git -C "$P" commit -m x && > + > + configure_exclusion "$P" my-blob >h && > + > + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ > + git -c protocol.version=2 \ > + -c fetch.uriprotocols=http,https \ > + clone "$HTTPD_URL/smart/http_parent" http_child && > + > + grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" We don't rely on GNU options like those for the test suite, it'll break on various supported platformrs. In this case the whole LHS of the pipe looks like it could be dropped, why not grep for "^clone< <redacted>"? Also you don't need to quote the space character in regexes, it's not a metacharacter. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/2] fetch-pack: redact packfile urls in traces 2021-10-08 19:36 ` Ævar Arnfjörð Bjarmason @ 2021-10-08 23:15 ` Ivan Frade 0 siblings, 0 replies; 43+ messages in thread From: Ivan Frade @ 2021-10-08 23:15 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ivan Frade via GitGitGadget, git On Fri, Oct 8, 2021 at 12:42 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote: > > > diff --git a/http-fetch.c b/http-fetch.c > > index fa642462a9e..d35e33e4f65 100644 > > --- a/http-fetch.c > > +++ b/http-fetch.c > > @@ -63,7 +63,9 @@ static void fetch_single_packfile(struct object_id *packfile_hash, > > if (start_active_slot(preq->slot)) { > > run_active_slot(preq->slot); > > if (results.curl_result != CURLE_OK) { > > - die("Unable to get pack file %s\n%s", preq->url, > > + int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); > > + die("Unable to get offloaded pack file %s\n%s", > > + showUrl ? preq->url : "<redacted>", > > curl_errorstr); > > } > > } else { > > Your CL and commit message just talk about traes, but this is a die() > message. > > Perhaps it makes sense to redact it there too for some reason, but that > seems to be a thing to separately argue for. > > This message is shown interactively to users, and I could see it be > annoying to not have the URL that failed in your terminal output, even > if it has some one-time token. For a regular user the URL could be confusing (should they click on it? try to download it by themselves?). I also got a suggestion to print e.g. only the domain and maybe the packname. In any case, I agree it is a different thing than trace logging. I removed it from this patch. > > > + > > + grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" > > We don't rely on GNU options like those for the test suite, it'll break > on various supported platformrs. > > In this case the whole LHS of the pipe looks like it could be dropped, > why not grep for "^clone< <redacted>"? > > > Also you don't need to quote the space character in regexes, it's not a > metacharacter. Updated the grep expressions to look only for the relevant lines and removed the escaping of the space char. I was trying to limit the grep to the "packfile-uri" section, not to match something else by accident, but I think "obj-id http://" shouldn't match anything else in the clone response (no ref can start with http://). Thanks for the quick review! Ivan ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars 2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget @ 2021-10-08 16:03 ` Ivan Frade via GitGitGadget 2021-10-08 19:43 ` Ævar Arnfjörð Bjarmason 2021-10-09 2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-08 16:03 UTC (permalink / raw) To: git; +Cc: Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> Packfile-uri line specifies a hash of 40 hex character, but with SHA256 this hash size is 64. There are already tests using SHA256 (e.g. in ubuntu-latest/linux-clang). Update protocol-v2 documentation to indicate that the hash size depends on the hash algorithm in use. Signed-off-by: Ivan Frade <ifrade@google.com> --- Documentation/technical/protocol-v2.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 21e8258ccf3..a23f12d6c2b 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -393,7 +393,7 @@ header. Most sections are sent only when the packfile is sent. wanted-ref = obj-id SP refname packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri - packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF) + packfile-uri = PKT-LINE((40|64)*(HEXDIGIT) SP *%x20-ff LF) packfile = PKT-LINE("packfile" LF) *PKT-LINE(%x01-03 *%x00-ff) @@ -476,9 +476,9 @@ header. Most sections are sent only when the packfile is sent. * For each URI the server sends, it sends a hash of the pack's contents (as output by git index-pack) followed by the URI. - * The hashes are 40 hex characters long. When Git upgrades to a new - hash algorithm, this might need to be updated. (It should match - whatever index-pack outputs after "pack\t" or "keep\t". + * The hashes length is defined by the hash algorithm (40 hex + characters in SHA-1, 64 in SHA-256). It should match whatever + index-pack outputs after "pack\t" or "keep\t". packfile section * This section is only included if the client has sent 'want' -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars 2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget @ 2021-10-08 19:43 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-08 19:43 UTC (permalink / raw) To: Ivan Frade via GitGitGadget; +Cc: git, Ivan Frade On Fri, Oct 08 2021, Ivan Frade via GitGitGadget wrote: > From: Ivan Frade <ifrade@google.com> > > Packfile-uri line specifies a hash of 40 hex character, but with SHA256 > this hash size is 64. There are already tests using SHA256 (e.g. in > ubuntu-latest/linux-clang). > > Update protocol-v2 documentation to indicate that the hash size depends > on the hash algorithm in use. > > Signed-off-by: Ivan Frade <ifrade@google.com> > --- > Documentation/technical/protocol-v2.txt | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt > index 21e8258ccf3..a23f12d6c2b 100644 > --- a/Documentation/technical/protocol-v2.txt > +++ b/Documentation/technical/protocol-v2.txt > @@ -393,7 +393,7 @@ header. Most sections are sent only when the packfile is sent. > wanted-ref = obj-id SP refname > > packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri > - packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF) > + packfile-uri = PKT-LINE((40|64)*(HEXDIGIT) SP *%x20-ff LF) > > packfile = PKT-LINE("packfile" LF) > *PKT-LINE(%x01-03 *%x00-ff) > @@ -476,9 +476,9 @@ header. Most sections are sent only when the packfile is sent. > * For each URI the server sends, it sends a hash of the pack's > contents (as output by git index-pack) followed by the URI. > > - * The hashes are 40 hex characters long. When Git upgrades to a new > - hash algorithm, this might need to be updated. (It should match > - whatever index-pack outputs after "pack\t" or "keep\t". > + * The hashes length is defined by the hash algorithm (40 hex > + characters in SHA-1, 64 in SHA-256). It should match whatever > + index-pack outputs after "pack\t" or "keep\t". > > packfile section > * This section is only included if the client has sent 'want' (I forgot to say in my first reply, but welcome to the Git Mailing List!) This is well spotted, but it seems even better to simply drop this exhaustive listing of 40 or 64 hex digits here. In protocol-common.txt we talk about "obj-id", and that's then used elsewhere in protocol-v2.txt matter-of-factly, e.g. (quoting from a handy part that happens to use "obj-id"): [...] obj-id-or-unborn = (obj-id | "unborn") ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF) [...] So let's just have packfile-uri do the same. Now, the thing that *does* need to be updated then is protocol-common.txt, or this part: zero-id = 40*"0" obj-id = 40*(HEXDIGIT) Because now if you use obj-id that'll just refer back to that, but that's also a problem with all the rest of the protocol docs. It would seem that all our SHA-256 tests and client/servers are in violation of the documentation, and should truncate their OIDs to 40 chars, or we could fix the docs :) Anyway, whatever we do here this improvement is unrelated to whatever we're doing with log redaction in your 1/2, I think it would be better to submit as its own 1 or 2 patch series. ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2] fetch-pack: redact packfile urls in traces 2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget 2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget @ 2021-10-09 2:20 ` Ivan Frade via GitGitGadget 2021-10-11 20:39 ` Junio C Hamano 2021-10-19 22:57 ` [PATCH v3] " Ivan Frade via GitGitGadget 2 siblings, 2 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-09 2:20 UTC (permalink / raw) To: git; +Cc: Ævar Arnfjörð Bjarmason, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Changes since v1: - Removed non-POSIX flags in tests - More accurate regex for the non-encrypted packfile line - Dropped documentation change - Dropped redacting the die message in http-fetch Signed-off-by: Ivan Frade <ifrade@google.com> --- fetch-pack: redact packfile urls in traces In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade ifrade@google.com cc: Ævar Arnfjörð Bjarmason avarab@gmail.com Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 Range-diff vs v1: 1: b473f145a87 ! 1: 701cb7a6ab9 fetch-pack: redact packfile urls in traces @@ Commit message variable is set to false. This mimics the redacting of the Authorization header in HTTP. + Changes since v1: + - Removed non-POSIX flags in tests + - More accurate regex for the non-encrypted packfile line + - Dropped documentation change + - Dropped redacting the die message in http-fetch + Signed-off-by: Ivan Frade <ifrade@google.com> ## fetch-pack.c ## @@ fetch-pack.c: static void receive_packfile_uris(struct packet_reader *reader, die("expected DELIM"); } - ## http-fetch.c ## -@@ http-fetch.c: static void fetch_single_packfile(struct object_id *packfile_hash, - if (start_active_slot(preq->slot)) { - run_active_slot(preq->slot); - if (results.curl_result != CURLE_OK) { -- die("Unable to get pack file %s\n%s", preq->url, -+ int showUrl = git_env_bool("GIT_TRACE_REDACT", 1); -+ die("Unable to get offloaded pack file %s\n%s", -+ showUrl ? preq->url : "<redacted>", - curl_errorstr); - } - } else { - ## pkt-line.c ## @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + -+ grep -A1 "clone<\ ..packfile-uris" log | grep "clone<\ <redacted>" ++ grep "clone< <redacted>" log +' + +test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' ' @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + -+ grep -A1 "clone<\ ..packfile-uris" log | grep -E "clone<\ ..[[:alnum:]]{40,64}\ http" ++ grep -E "clone< ..[0-9a-f]{40,64} http://" log +' + test_expect_success 'http:// --negotiate-only' ' 2: 497c5fd18d7 < -: ----------- Documentation: packfile-uri hash can be longer than 40 hex chars fetch-pack.c | 11 +++++++++++ pkt-line.c | 7 ++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 43 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 61 insertions(+), 1 deletion(-) diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..05c85eeafa1 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1518,7 +1518,16 @@ static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { + int original_options; process_section_header(reader, "packfile-uris", 0); + /* + * In some setups, packfile-uris act as bearer tokens, + * redact them by default. + */ + original_options = reader->options; + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader->options |= PACKET_READ_REDACT_ON_TRACE; + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || reader->line[the_hash_algo->hexsz] != ' ') @@ -1526,6 +1535,8 @@ static void receive_packfile_uris(struct packet_reader *reader, string_list_append(uris, reader->line); } + reader->options = original_options; + if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } diff --git a/pkt-line.c b/pkt-line.c index de4a94b437e..8da8ed88ccf 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -443,7 +443,12 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_ON_TRACE) { + const char *redacted = "<redacted>"; + packet_trace(redacted, strlen(redacted), 0); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 82b95e4bdd3..44c02f3bc6e 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -88,6 +88,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_ON_TRACE (1u<<4) int packet_read(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int options); diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..f0273317861 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,49 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep "clone< <redacted>" log +' + +test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -E "clone< ..[0-9a-f]{40,64} http://" log +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && base-commit: 0785eb769886ae81e346df10e88bc49ffc0ac64e -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v2] fetch-pack: redact packfile urls in traces 2021-10-09 2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget @ 2021-10-11 20:39 ` Junio C Hamano 2021-10-26 19:32 ` Ivan Frade 2021-10-19 22:57 ` [PATCH v3] " Ivan Frade via GitGitGadget 1 sibling, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2021-10-11 20:39 UTC (permalink / raw) To: Ivan Frade via GitGitGadget Cc: git, Ævar Arnfjörð Bjarmason, Ivan Frade "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Ivan Frade <ifrade@google.com> > > In some setups, packfile uris act as bearer token. It is not > recommended to expose them plainly in logs, although in special > circunstances (e.g. debug) it makes sense to write them. > > Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT > variable is set to false. This mimics the redacting of the > Authorization header in HTTP. Well explained. It of course is a different matter if the explained idea is agreeable, though ;-). Hiding the entire packet, based on the "it might be in some setups" seems a bit too much. Is it often the case that the whole URI is sensitive, or perhaps leading "<scheme>://<host>/pack-<abc>.pack" part is not sensitive at all, and what follows after that "public" part has some "nonce" material that makes it sensitive? > Changes since v1: > - Removed non-POSIX flags in tests > - More accurate regex for the non-encrypted packfile line > - Dropped documentation change > - Dropped redacting the die message in http-fetch These are not for those who read "git log" in 3 months, as they may not even have seen the "v1". But these are very helpful for those who read the "v1" to see how good this round is. Please write such material below the three-dash line. > Signed-off-by: Ivan Frade <ifrade@google.com> > --- i.e. here. > fetch-pack: redact packfile urls in traces > > In some setups, packfile uris act as bearer token. It is not recommended > to expose them plainly in logs, although in special circunstances (e.g. > debug) it makes sense to write them. > > Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT > variable is set to false. This mimics the redacting of the Authorization > header in HTTP. > > Signed-off-by: Ivan Frade ifrade@google.com > > cc: Ævar Arnfjörð Bjarmason avarab@gmail.com And there is no need to duplicate the log message here ;-) > diff --git a/fetch-pack.c b/fetch-pack.c > index a9604f35a3e..05c85eeafa1 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -1518,7 +1518,16 @@ static void receive_wanted_refs(struct packet_reader *reader, > static void receive_packfile_uris(struct packet_reader *reader, > struct string_list *uris) > { > + int original_options; > process_section_header(reader, "packfile-uris", 0); > + /* > + * In some setups, packfile-uris act as bearer tokens, > + * redact them by default. > + */ > + original_options = reader->options; > + if (git_env_bool("GIT_TRACE_REDACT", 1)) > + reader->options |= PACKET_READ_REDACT_ON_TRACE; > + > while (packet_reader_read(reader) == PACKET_READ_NORMAL) { > if (reader->pktlen < the_hash_algo->hexsz || > reader->line[the_hash_algo->hexsz] != ' ') > @@ -1526,6 +1535,8 @@ static void receive_packfile_uris(struct packet_reader *reader, > > string_list_append(uris, reader->line); > } > + reader->options = original_options; So "original_options" is used to save away the reader->options so that it can be restored before returning to our caller? OK (it may be more common in this codebase to call such a variable "saved_X", though). > diff --git a/pkt-line.c b/pkt-line.c > index de4a94b437e..8da8ed88ccf 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -443,7 +443,12 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > len--; > > buffer[len] = 0; > - packet_trace(buffer, len, 0); > + if (options & PACKET_READ_REDACT_ON_TRACE) { > + const char *redacted = "<redacted>"; > + packet_trace(redacted, strlen(redacted), 0); > + } else { > + packet_trace(buffer, len, 0); > + } > ... > + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ > + git -c protocol.version=2 \ > + -c fetch.uriprotocols=http,https \ > + clone "$HTTPD_URL/smart/http_parent" http_child && > + > + grep "clone< <redacted>" log This checks only that "redacted" string appears, but what the theme of the change really cares about is different, no? You want to ensure that no sensitive substring of the URI appears in the log. Imagine somebody breaking the redact logic by making it prepend that string to the payload, instead of replacing the payload with that string---this test will not catch such a regression. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2] fetch-pack: redact packfile urls in traces 2021-10-11 20:39 ` Junio C Hamano @ 2021-10-26 19:32 ` Ivan Frade 0 siblings, 0 replies; 43+ messages in thread From: Ivan Frade @ 2021-10-26 19:32 UTC (permalink / raw) To: Junio C Hamano Cc: Ivan Frade via GitGitGadget, git, Ævar Arnfjörð Bjarmason It seems I sent my original reply only to the github PR. Sorry for the confusion: On Mon, Oct 11, 2021 at 1:39 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Ivan Frade <ifrade@google.com> ... > > It of course is a different matter if the explained idea is > agreeable, though ;-). Hiding the entire packet, based on the "it > might be in some setups" seems a bit too much. > > Is it often the case that the whole URI is sensitive, or perhaps > leading "<scheme>://<host>/pack-<abc>.pack" part is not sensitive at > all, and what follows after that "public" part has some "nonce" > material that makes it sensitive? In the specific case I am working on, the path of the URL is an encrypted string that shouldn't be completely exposed (exposing part of it would be fine). In general, I think we can assume that <scheme>://<host>/ are always "public" but the path could be sensitive. We could redact only the path (<scheme>://<host>/REDACTED), or even a fixed length of the URL? (<scheme>://<host>/pack-<xxREDACTED). In the next patch version I go with redacting the path. > > Changes since v1: ... > Please write such material below the three-dash line. Done > And there is no need to duplicate the log message here ;-) Done > So "original_options" is used to save away the reader->options so > that it can be restored before returning to our caller? > > OK (it may be more common in this codebase to call such a variable > "saved_X", though). In the latest iteration, the option is enabled for all sections and there is no need to set/unset the flag. > > + grep "clone< <redacted>" log > > This checks only that "redacted" string appears, but what the theme > of the change really cares about is different, no? You want to > ensure that no sensitive substring of the URI appears in the log. > > Imagine somebody breaking the redact logic by making it prepend that > string to the payload, instead of replacing the payload with that > string---this test will not catch such a regression. Now the tests verify the expected packfile-uri full line is in the log. Thanks, Ivan ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v3] fetch-pack: redact packfile urls in traces 2021-10-09 2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-11 20:39 ` Junio C Hamano @ 2021-10-19 22:57 ` Ivan Frade via GitGitGadget 2021-10-20 11:41 ` Ævar Arnfjörð Bjarmason 2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget 1 sibling, 2 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-19 22:57 UTC (permalink / raw) To: git; +Cc: Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- fetch-pack: redact packfile urls in traces Changes since v1: * Redact only the path of the URL * Test are now strict, validating the exact line expected in the log Changes since v1: * Removed non-POSIX flags in tests * More accurate regex for the non-encrypted packfile line * Dropped documentation change * Dropped redacting the die message in http-fetch Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 Range-diff vs v2: 1: 701cb7a6ab9 ! 1: 9afe0093af4 fetch-pack: redact packfile urls in traces @@ Commit message recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. - Redact the packfile-uri lines by default, unless the GIT_TRACE_REDACT - variable is set to false. This mimics the redacting of the - Authorization header in HTTP. - - Changes since v1: - - Removed non-POSIX flags in tests - - More accurate regex for the non-encrypted packfile line - - Dropped documentation change - - Dropped redacting the die message in http-fetch + Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT + variable is set to false. This mimics the redacting of the Authorization + header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> @@ fetch-pack.c: static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { -+ int original_options; ++ int saved_options; process_section_header(reader, "packfile-uris", 0); + /* + * In some setups, packfile-uris act as bearer tokens, + * redact them by default. + */ -+ original_options = reader->options; ++ saved_options = reader->options; + if (git_env_bool("GIT_TRACE_REDACT", 1)) -+ reader->options |= PACKET_READ_REDACT_ON_TRACE; ++ reader->options |= PACKET_READ_REDACT_URL_PATH; + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || @@ fetch-pack.c: static void receive_packfile_uris(struct packet_reader *reader, string_list_append(uris, reader->line); } -+ reader->options = original_options; ++ reader->options = saved_options; + if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } ## pkt-line.c ## +@@ pkt-line.c: int packet_length(const char lenbuf_hex[4]) + return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); + } + ++static int find_url_path_start(const char* buffer) ++{ ++ const char *URL_MARK = "://"; ++ char *p = strstr(buffer, URL_MARK); ++ if (!p) { ++ return -1; ++ } ++ ++ p += strlen(URL_MARK); ++ while (*p && *p != '/') ++ p++; ++ ++ // Position after '/' ++ if (*p && *(p + 1)) ++ return (p + 1) - buffer; ++ ++ return -1; ++} ++ + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, + size_t *src_len, char *buffer, + unsigned size, int *pktlen, +@@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_buffer, + { + int len; + char linelen[4]; ++ int url_path_start; + + if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { + *pktlen = -1; @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); -+ if (options & PACKET_READ_REDACT_ON_TRACE) { ++ if (options & PACKET_READ_REDACT_URL_PATH && ++ (url_path_start = find_url_path_start(buffer)) != -1) { + const char *redacted = "<redacted>"; -+ packet_trace(redacted, strlen(redacted), 0); ++ struct strbuf tracebuf = STRBUF_INIT; ++ strbuf_insert(&tracebuf, 0, buffer, len); ++ strbuf_splice(&tracebuf, url_path_start, ++ len - url_path_start, redacted, strlen(redacted)); ++ packet_trace(tracebuf.buf, tracebuf.len, 0); ++ strbuf_release(&tracebuf); + } else { + packet_trace(buffer, len, 0); + } @@ pkt-line.h: void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) -+#define PACKET_READ_REDACT_ON_TRACE (1u<<4) ++#define PACKET_READ_REDACT_URL_PATH (1u<<4) int packet_read(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int options); @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje test_i18ngrep "disallowed submodule name" err ' -+test_expect_success 'packfile-uri redacted in trace' ' ++test_expect_success 'packfile-uri path redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + git -C "$P" add my-blob && + git -C "$P" commit -m x && + -+ configure_exclusion "$P" my-blob >h && ++ git -C "$P" hash-object my-blob >objh && ++ git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && ++ git -C "$P" config --add \ ++ "uploadpack.blobpackfileuri" \ ++ "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + -+ grep "clone< <redacted>" log ++ grep -F "clone< \\1$(cat packh) $HTTPD_URL/<redacted>" log +' + -+test_expect_success 'packfile-uri not redacted in trace when GIT_TRACE_REDACT=0' ' ++test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + git -C "$P" add my-blob && + git -C "$P" commit -m x && + -+ configure_exclusion "$P" my-blob >h && ++ git -C "$P" hash-object my-blob >objh && ++ git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && ++ git -C "$P" config --add \ ++ "uploadpack.blobpackfileuri" \ ++ "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + -+ grep -E "clone< ..[0-9a-f]{40,64} http://" log ++ grep -F "clone< \\1$(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" log +' + test_expect_success 'http:// --negotiate-only' ' fetch-pack.c | 11 +++++++++ pkt-line.c | 33 ++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 95 insertions(+), 1 deletion(-) diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..1587b9ae662 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1518,7 +1518,16 @@ static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { + int saved_options; process_section_header(reader, "packfile-uris", 0); + /* + * In some setups, packfile-uris act as bearer tokens, + * redact them by default. + */ + saved_options = reader->options; + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader->options |= PACKET_READ_REDACT_URL_PATH; + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || reader->line[the_hash_algo->hexsz] != ' ') @@ -1526,6 +1535,8 @@ static void receive_packfile_uris(struct packet_reader *reader, string_list_append(uris, reader->line); } + reader->options = saved_options; + if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } diff --git a/pkt-line.c b/pkt-line.c index de4a94b437e..1a9e6870559 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -386,6 +386,25 @@ int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } +static int find_url_path_start(const char* buffer) +{ + const char *URL_MARK = "://"; + char *p = strstr(buffer, URL_MARK); + if (!p) { + return -1; + } + + p += strlen(URL_MARK); + while (*p && *p != '/') + p++; + + // Position after '/' + if (*p && *(p + 1)) + return (p + 1) - buffer; + + return -1; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int *pktlen, @@ -393,6 +412,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, { int len; char linelen[4]; + int url_path_start; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ -443,7 +463,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_URL_PATH && + (url_path_start = find_url_path_start(buffer)) != -1) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); + strbuf_splice(&tracebuf, url_path_start, + len - url_path_start, redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 82b95e4bdd3..853d20688c8 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -88,6 +88,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_URL_PATH (1u<<4) int packet_read(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int options); diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..f01af2f2ed3 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,57 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri path redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/<redacted>" log +' + +test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" log +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && base-commit: 9d530dc0024503ab4218fe6c4395b8a0aa245478 -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v3] fetch-pack: redact packfile urls in traces 2021-10-19 22:57 ` [PATCH v3] " Ivan Frade via GitGitGadget @ 2021-10-20 11:41 ` Ævar Arnfjörð Bjarmason 2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget 1 sibling, 0 replies; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 11:41 UTC (permalink / raw) To: Ivan Frade via GitGitGadget; +Cc: git, Ivan Frade On Tue, Oct 19 2021, Ivan Frade via GitGitGadget wrote: > From: Ivan Frade <ifrade@google.com> > > In some setups, packfile uris act as bearer token. It is not > recommended to expose them plainly in logs, although in special > circunstances (e.g. debug) it makes sense to write them. > > Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT > variable is set to false. This mimics the redacting of the Authorization > header in HTTP. > > Signed-off-by: Ivan Frade <ifrade@google.com> > --- > fetch-pack: redact packfile urls in traces > > Changes since v1: Just context for other reviewers: s/Changes since v1/Changes since v2/ I see, from the context of https://lore.kernel.org/git/pull.1052.v2.git.1633746024175.gitgitgadget@gmail.com/ > * Redact only the path of the URL > * Test are now strict, validating the exact line expected in the log And this was changed in v2... > Changes since v1: > > * Removed non-POSIX flags in tests > * More accurate regex for the non-encrypted packfile line [...] > * Dropped documentation change > * Dropped redacting the die message in http-fetch Since both of those were done I think in response to my feedback I just want to clarify (if needed): * That the documentation change is still good to have, although I had feedback on fixing that more generally in the protocol v2 docs. It would be great if you could still pursue it (and that I didn't discourage you from doing so). * I think having this redaction in die() could still be valuable, e.g. your packfile-uri's start failing, and now users are copy/pasting "private" URLs that contain their passwords or whatever to try to get help, that would be bad. But perhaps if you don't have private URLs redacting them unconditionally would slow down debugging for some, i.e. you have 10x pasted URLs, and all the errors are from one set of servers (although your current redaction includes the hostname, which I think would address most cases of say one CDN node failing). It was really just a comment that your v1's commit message didn't mention or justify it, but just having it make a mention of it would also be an OK solution, or fold that into another patch or whatever... > { > + int saved_options; > process_section_header(reader, "packfile-uris", 0); > + /* > + * In some setups, packfile-uris act as bearer tokens, > + * redact them by default. > + */ > + saved_options = reader->options; nit: no need to pre-declare "int saved_options" here, just move this & the comment above "process_section_header" (in this case I'd say a comment isn't even needed, obvious from context...), or it should be at the definition of PACKET_READ_REDACT_URL_PATH... (more below) > + if (git_env_bool("GIT_TRACE_REDACT", 1)) If we're going to use GIT_TRACE_REDACT for this the documentation needs updating: Documentation/git.txt:`GIT_TRACE_REDACT`:: Documentation/git.txt- By default, when tracing is activated, Git redacts the values of Documentation/git.txt- cookies, the "Authorization:" header, and the "Proxy-Authorization:" Documentation/git.txt- header. Set this variable to `0` to prevent this redaction. > + reader->options |= PACKET_READ_REDACT_URL_PATH; (continued)... but that was from a really narrow reading of the code, I think this whole flip-flopping of options back and forth isn't needed at all, and you should just assign this flag at the top of do_fetch_pack_v2(), no? The setting of it also looks like it belongs with the reading of "GIT_TEST_SIDEBAND_ALL". I.e. nothing else uses PACKET_READ_REDACT_URL_PATH, why do we need to be flipping it back & forth? Keeping these flags in the "reader" is what that member is for, isn't it? Maybe I'm missing something. > +static int find_url_path_start(const char* buffer) > +{ > + const char *URL_MARK = "://"; > + char *p = strstr(buffer, URL_MARK); > + if (!p) { > + return -1; > + } > + > + p += strlen(URL_MARK); > + while (*p && *p != '/') > + p++; > + > + // Position after '/' > + if (*p && *(p + 1)) > + return (p + 1) - buffer; > + > + return -1; > +} I think that packfile URI only supports http:// and https://, not file:// or whatever, so I wonder if either curl or we have a helper function for this that we can use....(more below) > enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > size_t *src_len, char *buffer, > unsigned size, int *pktlen, > @@ -393,6 +412,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > { > int len; > char linelen[4]; > + int url_path_start; > > if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { > *pktlen = -1; > @@ -443,7 +463,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > len--; > > buffer[len] = 0; > - packet_trace(buffer, len, 0); > + if (options & PACKET_READ_REDACT_URL_PATH && > + (url_path_start = find_url_path_start(buffer)) != -1) { > + const char *redacted = "<redacted>"; > + struct strbuf tracebuf = STRBUF_INIT; > + strbuf_insert(&tracebuf, 0, buffer, len); > + strbuf_splice(&tracebuf, url_path_start, > + len - url_path_start, redacted, strlen(redacted)); > + packet_trace(tracebuf.buf, tracebuf.len, 0); > + strbuf_release(&tracebuf); > + } else { > + packet_trace(buffer, len, 0); > + } ...If we're redacting the URL isn't (and this might be less code with a helper function) saying: failed to get 'https' url from 'somehost.example.com' (full URL redacted due to XYZ setting) Friendlier than something like (which this function sets up): failed to get 'https://somehost.example.com/<redacted>' url I.e. it allows us to use a get_schema_from_url() and get_host_from_url() functions (I don't know if/where we have those, but seems likelier than "find path url boundary" (or maybe I'm wrong and we always feed those as-is to curl et al). ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v4 0/2] fetch-pack: redact packfile urls in traces 2021-10-19 22:57 ` [PATCH v3] " Ivan Frade via GitGitGadget 2021-10-20 11:41 ` Ævar Arnfjörð Bjarmason @ 2021-10-26 22:49 ` Ivan Frade via GitGitGadget 2021-10-26 22:49 ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget ` (2 more replies) 1 sibling, 3 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-26 22:49 UTC (permalink / raw) To: git; +Cc: Ivan Frade Changes since v3: * Enable redacting URLs for all sections * Redact only URL path (it was until the end of line) * Redact URL in die() with more friendly message * Update doc to mention that packfile URIs are also redacted. Changes since v2: * Redact only the path of the URL * Test are now strict, validating the exact line expected in the log Changes since v1: * Removed non-POSIX flags in tests * More accurate regex for the non-encrypted packfile line * Dropped documentation change * Dropped redacting the die message in http-fetch Ivan Frade (2): fetch-pack: redact packfile urls in traces http-fetch: redact url on die() message Documentation/git.txt | 5 +++-- fetch-pack.c | 3 +++ http-fetch.c | 15 +++++++++++-- pkt-line.c | 40 ++++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 110 insertions(+), 5 deletions(-) base-commit: e9e5ba39a78c8f5057262d49e261b42a8660d5b9 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 Range-diff vs v3: 1: 9afe0093af4 ! 1: 973a250752c fetch-pack: redact packfile urls in traces @@ Commit message Signed-off-by: Ivan Frade <ifrade@google.com> - ## fetch-pack.c ## -@@ fetch-pack.c: static void receive_wanted_refs(struct packet_reader *reader, - static void receive_packfile_uris(struct packet_reader *reader, - struct string_list *uris) - { -+ int saved_options; - process_section_header(reader, "packfile-uris", 0); -+ /* -+ * In some setups, packfile-uris act as bearer tokens, -+ * redact them by default. -+ */ -+ saved_options = reader->options; -+ if (git_env_bool("GIT_TRACE_REDACT", 1)) -+ reader->options |= PACKET_READ_REDACT_URL_PATH; -+ - while (packet_reader_read(reader) == PACKET_READ_NORMAL) { - if (reader->pktlen < the_hash_algo->hexsz || - reader->line[the_hash_algo->hexsz] != ' ') -@@ fetch-pack.c: static void receive_packfile_uris(struct packet_reader *reader, + ## Documentation/git.txt ## +@@ Documentation/git.txt: for full details. - string_list_append(uris, reader->line); + `GIT_TRACE_REDACT`:: + By default, when tracing is activated, Git redacts the values of +- cookies, the "Authorization:" header, and the "Proxy-Authorization:" +- header. Set this variable to `0` to prevent this redaction. ++ cookies, the "Authorization:" header, the "Proxy-Authorization:" ++ header and packfile URLs. Set this variable to `0` to prevent this ++ redaction. + + `GIT_LITERAL_PATHSPECS`:: + Setting this variable to `1` will cause Git to treat all + + ## fetch-pack.c ## +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, + reader.me = "fetch-pack"; } -+ reader->options = saved_options; + ++ if (git_env_bool("GIT_TRACE_REDACT", 1)) ++ reader.options |= PACKET_READ_REDACT_URL_PATH; + - if (reader->status != PACKET_READ_DELIM) - die("expected DELIM"); - } + while (state != FETCH_DONE) { + switch (state) { + case FETCH_CHECK_LOCAL: ## pkt-line.c ## @@ pkt-line.c: int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } -+static int find_url_path_start(const char* buffer) ++static char *find_url_path(const char* buffer, int *path_len) +{ + const char *URL_MARK = "://"; -+ char *p = strstr(buffer, URL_MARK); -+ if (!p) { -+ return -1; -+ } ++ char *path = strstr(buffer, URL_MARK); ++ if (!path) ++ return NULL; + -+ p += strlen(URL_MARK); -+ while (*p && *p != '/') -+ p++; ++ path += strlen(URL_MARK); ++ while (*path && *path != '/') ++ path++; + -+ // Position after '/' -+ if (*p && *(p + 1)) -+ return (p + 1) - buffer; ++ if (!*path || !*(path + 1)) ++ return NULL; ++ ++ // position after '/' ++ path++; ++ ++ if (path_len) { ++ char *url_end = strchrnul(path, ' '); ++ *path_len = url_end - path; ++ } + -+ return -1; ++ return path; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_b { int len; char linelen[4]; -+ int url_path_start; ++ char *url_path_start; ++ int url_path_len; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_b buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_URL_PATH && -+ (url_path_start = find_url_path_start(buffer)) != -1) { ++ (url_path_start = find_url_path(buffer, &url_path_len))) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); -+ strbuf_splice(&tracebuf, url_path_start, -+ len - url_path_start, redacted, strlen(redacted)); ++ strbuf_splice(&tracebuf, url_path_start - buffer, ++ url_path_len, redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { @@ pkt-line.h: void packet_fflush(FILE *f); #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_URL_PATH (1u<<4) - int packet_read(int fd, char **src_buffer, size_t *src_len, char - *buffer, unsigned size, int options); + int packet_read(int fd, char *buffer, unsigned size, int options); + /* ## t/t5702-protocol-v2.sh ## @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul -: ----------- > 2: c7f0977cabd http-fetch: redact url on die() message -- gitgitgadget ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v4 1/2] fetch-pack: redact packfile urls in traces 2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget @ 2021-10-26 22:49 ` Ivan Frade via GitGitGadget 2021-10-28 1:01 ` Junio C Hamano 2021-10-26 22:49 ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-26 22:49 UTC (permalink / raw) To: git; +Cc: Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- Documentation/git.txt | 5 +++-- fetch-pack.c | 3 +++ pkt-line.c | 40 ++++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 97 insertions(+), 3 deletions(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index d63c65e67d8..f64c8ce5183 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -832,8 +832,9 @@ for full details. `GIT_TRACE_REDACT`:: By default, when tracing is activated, Git redacts the values of - cookies, the "Authorization:" header, and the "Proxy-Authorization:" - header. Set this variable to `0` to prevent this redaction. + cookies, the "Authorization:" header, the "Proxy-Authorization:" + header and packfile URLs. Set this variable to `0` to prevent this + redaction. `GIT_LITERAL_PATHSPECS`:: Setting this variable to `1` will cause Git to treat all diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..ad8ac49ca50 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1581,6 +1581,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, reader.me = "fetch-pack"; } + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader.options |= PACKET_READ_REDACT_URL_PATH; + while (state != FETCH_DONE) { switch (state) { case FETCH_CHECK_LOCAL: diff --git a/pkt-line.c b/pkt-line.c index 2dc8ac274bd..ba0a2d65f0c 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } +static char *find_url_path(const char* buffer, int *path_len) +{ + const char *URL_MARK = "://"; + char *path = strstr(buffer, URL_MARK); + if (!path) + return NULL; + + path += strlen(URL_MARK); + while (*path && *path != '/') + path++; + + if (!*path || !*(path + 1)) + return NULL; + + // position after '/' + path++; + + if (path_len) { + char *url_end = strchrnul(path, ' '); + *path_len = url_end - path; + } + + return path; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int *pktlen, @@ -377,6 +402,8 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, { int len; char linelen[4]; + char *url_path_start; + int url_path_len; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ -427,7 +454,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_URL_PATH && + (url_path_start = find_url_path(buffer, &url_path_len))) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); + strbuf_splice(&tracebuf, url_path_start - buffer, + url_path_len, redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 467ae013573..a610ecb88e8 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -87,6 +87,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_URL_PATH (1u<<4) int packet_read(int fd, char *buffer, unsigned size, int options); /* diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..f01af2f2ed3 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,57 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri path redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/<redacted>" log +' + +test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" log +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v4 1/2] fetch-pack: redact packfile urls in traces 2021-10-26 22:49 ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget @ 2021-10-28 1:01 ` Junio C Hamano 2021-10-28 22:15 ` Ivan Frade 0 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2021-10-28 1:01 UTC (permalink / raw) To: Ivan Frade via GitGitGadget; +Cc: git, Ivan Frade, Jonathan Tan "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Ivan Frade <ifrade@google.com> > > In some setups, packfile uris act as bearer token. It is not > recommended to expose them plainly in logs, although in special > circunstances (e.g. debug) it makes sense to write them. > > Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT > variable is set to false. This mimics the redacting of the Authorization > header in HTTP. > > Signed-off-by: Ivan Frade <ifrade@google.com> > --- > Documentation/git.txt | 5 +++-- > fetch-pack.c | 3 +++ > pkt-line.c | 40 ++++++++++++++++++++++++++++++++- > pkt-line.h | 1 + > t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 97 insertions(+), 3 deletions(-) > > diff --git a/Documentation/git.txt b/Documentation/git.txt > index d63c65e67d8..f64c8ce5183 100644 > --- a/Documentation/git.txt > +++ b/Documentation/git.txt > @@ -832,8 +832,9 @@ for full details. > > `GIT_TRACE_REDACT`:: > By default, when tracing is activated, Git redacts the values of > - cookies, the "Authorization:" header, and the "Proxy-Authorization:" > - header. Set this variable to `0` to prevent this redaction. > + cookies, the "Authorization:" header, the "Proxy-Authorization:" > + header and packfile URLs. Set this variable to `0` to prevent this > + redaction. Just a curiosity. Do we call these packfile URI, or packfile URL? > diff --git a/pkt-line.c b/pkt-line.c > index 2dc8ac274bd..ba0a2d65f0c 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) > return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); > } > > +static char *find_url_path(const char* buffer, int *path_len) > +{ > + const char *URL_MARK = "://"; > + char *path = strstr(buffer, URL_MARK); > + if (!path) > + return NULL; Hmph, the format we expect is "<hash> <uri>"; don't we need to validate the leading <hash> followed by SP? len = strspn(buffer, "0123456789abcdefABCDEF"); if (len != 40 || len != 64 || buffer[len] != ' ') return NULL; /* required "<hash> SP" not seen */ path = strstr(buffer + len + 1, URL_MARK); or somesuch? > + path += strlen(URL_MARK); OK. > + while (*path && *path != '/') > + path++; strchr()? > + if (!*path || !*(path + 1)) > + return NULL; OK. > + // position after '/' No // comments in our codebase, please. Unless it is a borrowed code, that is. > + path++; > + > + if (path_len) { > + char *url_end = strchrnul(path, ' '); Is this because SP is not a valid character in packfile URI, or at this point in the callchain it would be encoded or something? The format we expect is "<hash> <uri>", so we shouldn't even have to look for SP but just redact everything to the end, no? Apparently we are assuming that there won't be more than one such URL-path that needs redacting in the packet, but that is perfectly fine, as the sole goal of this helper is to identify the packfile URI packet and redact it in the log. > + *path_len = url_end - path; > + } > + > + return path; > +} > - packet_trace(buffer, len, 0); > + if (options & PACKET_READ_REDACT_URL_PATH && > + (url_path_start = find_url_path(buffer, &url_path_len))) { > + const char *redacted = "<redacted>"; > + struct strbuf tracebuf = STRBUF_INIT; > + strbuf_insert(&tracebuf, 0, buffer, len); > + strbuf_splice(&tracebuf, url_path_start - buffer, > + url_path_len, redacted, strlen(redacted)); > + packet_trace(tracebuf.buf, tracebuf.len, 0); > + strbuf_release(&tracebuf); I briefly wondered if the repeated allocation (and more fundamentally, preparing the redacted copy of packet whether we are actually tracing the packet in the first place) is blindly wasting the resources too much, but this only happens in the protocol header part, so it might be OK. Even if that is not the case, we should be able to update fetch_pack.c::do_fetch_pack_v2() so that the REDACT_URL_PATH bit is turned on in a much narrower region of code, right? Enable when we enter the GET_PACK state and drop the bit when we are done with the packfile URI packets, or something? Thanks for working on this. > + } else { > + packet_trace(buffer, len, 0); > + } ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 1/2] fetch-pack: redact packfile urls in traces 2021-10-28 1:01 ` Junio C Hamano @ 2021-10-28 22:15 ` Ivan Frade 2021-10-28 22:46 ` Junio C Hamano 0 siblings, 1 reply; 43+ messages in thread From: Ivan Frade @ 2021-10-28 22:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ivan Frade via GitGitGadget, git, Jonathan Tan On Wed, Oct 27, 2021 at 6:01 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Ivan Frade <ifrade@google.com> > > > Just a curiosity. Do we call these packfile URI, or packfile URL? The feature is "packfile URI" (and the section is called so in the protocol). I changed all "url" to "uri". > > diff --git a/pkt-line.c b/pkt-line.c > > index 2dc8ac274bd..ba0a2d65f0c 100644 > > --- a/pkt-line.c > > +++ b/pkt-line.c > > @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) > > return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); > > } > > > > +static char *find_url_path(const char* buffer, int *path_len) > > +{ > > + const char *URL_MARK = "://"; > > + char *path = strstr(buffer, URL_MARK); > > + if (!path) > > + return NULL; > > Hmph, the format we expect is "<hash> <uri>"; don't we need to > validate the leading <hash> followed by SP? I was trying to find a uri in a packet in general, not counting on the packfile-uri line format. That is probably an overgeneralization. Next patch version follows these suggestions to look for a packfile-uri line. > > + if (path_len) { > > + char *url_end = strchrnul(path, ' '); > > Is this because SP is not a valid character in packfile URI, or at > this point in the callchain it would be encoded or something? The > format we expect is "<hash> <uri>", so we shouldn't even have to > look for SP but just redact everything to the end, no? Yes, now that we count on the packfile-uri line format, we can redact everything to the end and there is no need to return the length. > > - packet_trace(buffer, len, 0); > > + if (options & PACKET_READ_REDACT_URL_PATH && > > + (url_path_start = find_url_path(buffer, &url_path_len))) { > > + const char *redacted = "<redacted>"; > > + struct strbuf tracebuf = STRBUF_INIT; > > + strbuf_insert(&tracebuf, 0, buffer, len); > > + strbuf_splice(&tracebuf, url_path_start - buffer, > > + url_path_len, redacted, strlen(redacted)); > > + packet_trace(tracebuf.buf, tracebuf.len, 0); > > + strbuf_release(&tracebuf); > > I briefly wondered if the repeated allocation (and more > fundamentally, preparing the redacted copy of packet whether we are > actually tracing the packet in the first place) is blindly wasting > the resources too much, but this only happens in the protocol header > part, so it might be OK. We only allocate and redact if it looks like a packfile-uri line, so it shouldn't happen too frequently. > Even if that is not the case, we should be able to update > fetch_pack.c::do_fetch_pack_v2() so that the REDACT_URL_PATH bit is > turned on in a much narrower region of code, right? Enable when we > enter the GET_PACK state and drop the bit when we are done with the > packfile URI packets, or something? I move the set/unset of the redacting flag to the FETCH_GET_PACK around the "packfile-uris" section. There is no need to check every incoming packet for a packfile-uri line, we know when they should come. Thanks, Ivan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 1/2] fetch-pack: redact packfile urls in traces 2021-10-28 22:15 ` Ivan Frade @ 2021-10-28 22:46 ` Junio C Hamano 0 siblings, 0 replies; 43+ messages in thread From: Junio C Hamano @ 2021-10-28 22:46 UTC (permalink / raw) To: Ivan Frade; +Cc: Ivan Frade via GitGitGadget, git, Jonathan Tan Ivan Frade <ifrade@google.com> writes: >> Hmph, the format we expect is "<hash> <uri>"; don't we need to >> validate the leading <hash> followed by SP? > > I was trying to find a uri in a packet in general, not counting on the > packfile-uri line format. That is probably an overgeneralization. Ah, I see. This is merely a tracing, so we might benefit from a generalized version of redactor, and from that point of view, the use of strstr and stopping at the whitespace do make sort-of sense to me, but then we lack any attempt to redact more than one instance of URL in a packet, so the generalization may have quite a limited usefulness. > Next patch version follows these suggestions to look for a packfile-uri line. Yeah, I think that is a good way to go, at least for now. When we want a more general one, we can revisit it, but not now. >> > - packet_trace(buffer, len, 0); >> > + if (options & PACKET_READ_REDACT_URL_PATH && >> > + (url_path_start = find_url_path(buffer, &url_path_len))) { >> > + const char *redacted = "<redacted>"; >> > + struct strbuf tracebuf = STRBUF_INIT; >> > + strbuf_insert(&tracebuf, 0, buffer, len); >> > + strbuf_splice(&tracebuf, url_path_start - buffer, >> > + url_path_len, redacted, strlen(redacted)); >> > + packet_trace(tracebuf.buf, tracebuf.len, 0); >> > + strbuf_release(&tracebuf); >> >> I briefly wondered if the repeated allocation (and more >> fundamentally, preparing the redacted copy of packet whether we are >> actually tracing the packet in the first place) is blindly wasting >> the resources too much, but this only happens in the protocol header >> part, so it might be OK. > > We only allocate and redact if it looks like a packfile-uri line, so > it shouldn't happen too frequently. I was mostly wondering about the cost of determining "if it looks like?". But we do this only for the protocol header part, so we won't have thousands of attempts to match, I guess. Oh, or if we also do this for the ref advertisement packets, then we might have quite a many. Hmph. > I move the set/unset of the redacting flag to the FETCH_GET_PACK > around the "packfile-uris" section. > There is no need to check every incoming packet for a packfile-uri > line, we know when they should come. Yeah, that is quite a wise design decision, I would think. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget 2021-10-26 22:49 ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget @ 2021-10-26 22:49 ` Ivan Frade via GitGitGadget 2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason 2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-26 22:49 UTC (permalink / raw) To: git; +Cc: Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> http-fetch prints the URL after failing to fetch it. This can be confusing to users (they cannot really do anything with it) but even worse, they can share by accident a sensitive URL (e.g. with credentials) while looking for help. Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This mimics the redaction of other sensitive information in git, like the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- http-fetch.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/http-fetch.c b/http-fetch.c index fa642462a9e..bbe09a6ad9f 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -4,6 +4,7 @@ #include "http.h" #include "walker.h" #include "strvec.h" +#include "urlmatch.h" static const char http_fetch_usage[] = "git http-fetch " "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url"; @@ -63,8 +64,18 @@ static void fetch_single_packfile(struct object_id *packfile_hash, if (start_active_slot(preq->slot)) { run_active_slot(preq->slot); if (results.curl_result != CURLE_OK) { - die("Unable to get pack file %s\n%s", preq->url, - curl_errorstr); + struct url_info url; + char *nurl = url_normalize(preq->url, &url); + if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { + die("Unable to get pack file %s\n%s", preq->url, + curl_errorstr); + } else { + char *schema = xstrndup(url.url, url.scheme_len); + char *host = xstrndup(&url.url[url.host_off], url.host_len); + die("failed to get '%s' url from '%s' " + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", + schema, host, curl_errorstr); + } } } else { die("Unable to start request"); -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-26 22:49 ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget @ 2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason 2021-10-28 17:25 ` Eric Sunshine ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-28 16:39 UTC (permalink / raw) To: Ivan Frade via GitGitGadget; +Cc: git, Ivan Frade On Tue, Oct 26 2021, Ivan Frade via GitGitGadget wrote: > From: Ivan Frade <ifrade@google.com> > > http-fetch prints the URL after failing to fetch it. This can be > confusing to users (they cannot really do anything with it) but even > worse, they can share by accident a sensitive URL (e.g. with > credentials) while looking for help. > > Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This > mimics the redaction of other sensitive information in git, like the > Authorization header in HTTP. > > Signed-off-by: Ivan Frade <ifrade@google.com> > --- > http-fetch.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/http-fetch.c b/http-fetch.c > index fa642462a9e..bbe09a6ad9f 100644 > --- a/http-fetch.c > +++ b/http-fetch.c > @@ -4,6 +4,7 @@ > #include "http.h" > #include "walker.h" > #include "strvec.h" > +#include "urlmatch.h" > > static const char http_fetch_usage[] = "git http-fetch " > "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url"; > @@ -63,8 +64,18 @@ static void fetch_single_packfile(struct object_id *packfile_hash, > if (start_active_slot(preq->slot)) { > run_active_slot(preq->slot); > if (results.curl_result != CURLE_OK) { > - die("Unable to get pack file %s\n%s", preq->url, > - curl_errorstr); > + struct url_info url; > + char *nurl = url_normalize(preq->url, &url); > + if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { > + die("Unable to get pack file %s\n%s", preq->url, > + curl_errorstr); small nit: arrange if's from "if (cheap || expensive)", i.e. no need for getenv() if !nurl, but maybe compilers are smart enough for that... nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so: die("unable to get pack '%s': '%s'", ...) Or maybe without the second '%s', as in 3e8084f1884 (http: check CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which I authored, but just copy/pasted the convention in the surrounding code)> > + } else { > + char *schema = xstrndup(url.url, url.scheme_len); > + char *host = xstrndup(&url.url[url.host_off], url.host_len); > + die("failed to get '%s' url from '%s' " > + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", > + schema, host, curl_errorstr); Hrm, I haven't tested, but aren't both of those xstrndup's redundant to using %*s instead of %s for the printf format? I.e.: die("failed to get '%*s'[...]", url.schema_len, url.url, ) ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason @ 2021-10-28 17:25 ` Eric Sunshine 2021-10-28 22:44 ` Ivan Frade 2021-10-28 22:41 ` Ivan Frade 2021-10-29 23:18 ` Junio C Hamano 2 siblings, 1 reply; 43+ messages in thread From: Eric Sunshine @ 2021-10-28 17:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Ivan Frade via GitGitGadget, Git List, Ivan Frade On Thu, Oct 28, 2021 at 12:46 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Tue, Oct 26 2021, Ivan Frade via GitGitGadget wrote: > > if (results.curl_result != CURLE_OK) { > > - die("Unable to get pack file %s\n%s", preq->url, > > - curl_errorstr); > > + struct url_info url; > > + char *nurl = url_normalize(preq->url, &url); > > + if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { > > + die("Unable to get pack file %s\n%s", preq->url, > > + curl_errorstr); > > small nit: arrange if's from "if (cheap || expensive)", i.e. no need for > getenv() if !nurl, but maybe compilers are smart enough for that... I had the same passing thought when glancing over this code (although this appears to be an error patch, thus not performance critical, so not terribly important). > nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so: > > die("unable to get pack '%s': '%s'", ...) > > Or maybe without the second '%s', as in 3e8084f1884 (http: check > CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which > I authored, but just copy/pasted the convention in the surrounding > code)> Note that this is not a new die() call; it just got indented as-is by this patch, so the changes you suggest to the message string are potentially outside the scope of this patch. Possibilities: (1) make the changes in this patch but mention them in the commit message; (2) make the changes in a preparatory patch; (3) punt on the changes for now. > > + } else { > > + char *schema = xstrndup(url.url, url.scheme_len); > > + char *host = xstrndup(&url.url[url.host_off], url.host_len); > > + die("failed to get '%s' url from '%s' " > > + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", > > + schema, host, curl_errorstr); > > Hrm, I haven't tested, but aren't both of those xstrndup's redundant to > using %*s instead of %s for the printf format? I.e.: > > die("failed to get '%*s'[...]", url.schema_len, url.url, ) I wondered the same when reading the patch. Thanks for mentioning it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-28 17:25 ` Eric Sunshine @ 2021-10-28 22:44 ` Ivan Frade 0 siblings, 0 replies; 43+ messages in thread From: Ivan Frade @ 2021-10-28 22:44 UTC (permalink / raw) To: Eric Sunshine Cc: Ævar Arnfjörð Bjarmason, Ivan Frade via GitGitGadget, Git List On Thu, Oct 28, 2021 at 10:26 AM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Thu, Oct 28, 2021 at 12:46 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > > nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so: > > > > die("unable to get pack '%s': '%s'", ...) > > > > Or maybe without the second '%s', as in 3e8084f1884 (http: check > > CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which > > I authored, but just copy/pasted the convention in the surrounding > > code)> > > Note that this is not a new die() call; it just got indented as-is by > this patch, so the changes you suggest to the message string are > potentially outside the scope of this patch. Possibilities: (1) make > the changes in this patch but mention them in the commit message; (2) > make the changes in a preparatory patch; (3) punt on the changes for > now. I went for option (1). It is a minimal change and we are moving that line in this commit. Ivan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason 2021-10-28 17:25 ` Eric Sunshine @ 2021-10-28 22:41 ` Ivan Frade 2021-10-29 23:18 ` Junio C Hamano 2 siblings, 0 replies; 43+ messages in thread From: Ivan Frade @ 2021-10-28 22:41 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ivan Frade via GitGitGadget, git On Thu, Oct 28, 2021 at 9:46 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Tue, Oct 26 2021, Ivan Frade via GitGitGadget wrote: > > > From: Ivan Frade <ifrade@google.com> > > ... > > + if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { > > + die("Unable to get pack file %s\n%s", preq->url, > > + curl_errorstr); > > small nit: arrange if's from "if (cheap || expensive)", i.e. no need for > getenv() if !nurl, but maybe compilers are smart enough for that... Done > nit: die() messages should start with lower-case (in CodingGuidelines), and I think it's better to quote both, so: > > die("unable to get pack '%s': '%s'", ...) > > Or maybe without the second '%s', as in 3e8084f1884 (http: check > CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors, 2021-09-24) (which > I authored, but just copy/pasted the convention in the surrounding > code)> Done > > + } else { > > + char *schema = xstrndup(url.url, url.scheme_len); > > + char *host = xstrndup(&url.url[url.host_off], url.host_len); > > + die("failed to get '%s' url from '%s' " > > + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", > > + schema, host, curl_errorstr); > > Hrm, I haven't tested, but aren't both of those xstrndup's redundant to > using %*s instead of %s for the printf format? I.e.: > > die("failed to get '%*s'[...]", url.schema_len, url.url, ) Indeed, "%.*s" did the trick. Thanks! Ivan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason 2021-10-28 17:25 ` Eric Sunshine 2021-10-28 22:41 ` Ivan Frade @ 2021-10-29 23:18 ` Junio C Hamano 2021-11-09 1:54 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2021-10-29 23:18 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Ivan Frade via GitGitGadget, git, Ivan Frade Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> + if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { >> + die("Unable to get pack file %s\n%s", preq->url, >> + curl_errorstr); > > small nit: arrange if's from "if (cheap || expensive)", i.e. no need for > getenv() if !nurl, but maybe compilers are smart enough for that... They typically do not see what happens inside git_env_bool() while compling this compilation unit, and cannot tell if the programmer wanted to call it first for its side effects, hence they cannot swap them safely. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 2/2] http-fetch: redact url on die() message 2021-10-29 23:18 ` Junio C Hamano @ 2021-11-09 1:54 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-09 1:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ivan Frade via GitGitGadget, git, Ivan Frade On Fri, Oct 29 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>> + if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { >>> + die("Unable to get pack file %s\n%s", preq->url, >>> + curl_errorstr); >> >> small nit: arrange if's from "if (cheap || expensive)", i.e. no need for >> getenv() if !nurl, but maybe compilers are smart enough for that... > > They typically do not see what happens inside git_env_bool() while > compling this compilation unit, and cannot tell if the programmer > wanted to call it first for its side effects, hence they cannot > swap them safely. *nod*, but since that function is just: int git_env_bool(const char *k, int def) { const char *v = getenv(k); return v ? git_config_bool(k, v) : def; } I was hedging and pondering if some compilers were smart enough these days to optimize things like that. I.e. in this case getenv() is a simple C library function, the env variable is constant, and we do a boolean test of it before calling git_config_bool(). So a sufficiently smart compiler could turn that into: /* global, probably something iterated over env already */ static int __have_seen_GIT_TRACE_REDACT = 0; ... if ((!__have_seen_GIT_TRACE_REDACT || !nurl) || (__have_seen_GIT_TRACE_REDACT && git_env_bool_without_v_bool_check(...))) But probably not, since it wolud need quite a bit of C library cooperation/hooks... ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v5 0/2] fetch-pack: redact packfile urls in traces 2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget 2021-10-26 22:49 ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget 2021-10-26 22:49 ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget @ 2021-10-28 22:51 ` Ivan Frade via GitGitGadget 2021-10-28 22:51 ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget ` (2 more replies) 2 siblings, 3 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-28 22:51 UTC (permalink / raw) To: git; +Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade Changes since v4: * Use "uri" instead of "url" * Look specifically for a line with packfile-uri format (instead of for a URL in general) * Limit the redacting to the packfile-uri section in do_fetch_pack_v2 * Use "%.*s" instead of duplicating parts of the string to print Changes since v3: * Enable redacting URLs for all sections * Redact only URL path (it was until the end of line) * Redact URL in die() with more friendly message * Update doc to mention that packfile URIs are also redacted. Changes since v2: * Redact only the path of the URL * Test are now strict, validating the exact line expected in the log Changes since v1: * Removed non-POSIX flags in tests * More accurate regex for the non-encrypted packfile line * Dropped documentation change * Dropped redacting the die message in http-fetch Ivan Frade (2): fetch-pack: redact packfile urls in traces http-fetch: redact url on die() message Documentation/git.txt | 5 +++-- fetch-pack.c | 4 ++++ http-fetch.c | 14 ++++++++++-- pkt-line.c | 39 +++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 5 deletions(-) base-commit: e9e5ba39a78c8f5057262d49e261b42a8660d5b9 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 Range-diff vs v4: 1: 973a250752c ! 1: c95b3cafcd6 fetch-pack: redact packfile urls in traces @@ Documentation/git.txt: for full details. - cookies, the "Authorization:" header, and the "Proxy-Authorization:" - header. Set this variable to `0` to prevent this redaction. + cookies, the "Authorization:" header, the "Proxy-Authorization:" -+ header and packfile URLs. Set this variable to `0` to prevent this ++ header and packfile URIs. Set this variable to `0` to prevent this + redaction. `GIT_LITERAL_PATHSPECS`:: @@ Documentation/git.txt: for full details. ## fetch-pack.c ## @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, - reader.me = "fetch-pack"; - } + receive_wanted_refs(&reader, sought, nr_sought); -+ if (git_env_bool("GIT_TRACE_REDACT", 1)) -+ reader.options |= PACKET_READ_REDACT_URL_PATH; -+ - while (state != FETCH_DONE) { - switch (state) { - case FETCH_CHECK_LOCAL: + /* get the pack(s) */ ++ if (git_env_bool("GIT_TRACE_REDACT", 1)) ++ reader.options |= PACKET_READ_REDACT_URI_PATH; + if (process_section_header(&reader, "packfile-uris", 1)) + receive_packfile_uris(&reader, &packfile_uris); ++ reader.options &= ~PACKET_READ_REDACT_URI_PATH; ++ + process_section_header(&reader, "packfile", 0); + + /* ## pkt-line.c ## @@ pkt-line.c: int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } -+static char *find_url_path(const char* buffer, int *path_len) ++static char *find_packfile_uri_path(const char *buffer) +{ -+ const char *URL_MARK = "://"; -+ char *path = strstr(buffer, URL_MARK); -+ if (!path) -+ return NULL; ++ const char *URI_MARK = "://"; ++ char *path; ++ int len; + -+ path += strlen(URL_MARK); -+ while (*path && *path != '/') -+ path++; ++ /* First char is sideband mark */ ++ buffer += 1; + -+ if (!*path || !*(path + 1)) -+ return NULL; ++ len = strspn(buffer, "0123456789abcdefABCDEF"); ++ if (!(len == 40 || len == 64) || buffer[len] != ' ') ++ return NULL; /* required "<hash>SP" not seen */ + -+ // position after '/' -+ path++; ++ path = strstr(buffer + len + 1, URI_MARK); ++ if (!path) ++ return NULL; + -+ if (path_len) { -+ char *url_end = strchrnul(path, ' '); -+ *path_len = url_end - path; -+ } ++ path = strchr(path + strlen(URI_MARK), '/'); ++ if (!path || !*(path + 1)) ++ return NULL; + -+ return path; ++ /* position after '/' */ ++ return ++path; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_b { int len; char linelen[4]; -+ char *url_path_start; -+ int url_path_len; ++ char *uri_path_start; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ pkt-line.c: enum packet_read_status packet_read_with_status(int fd, char **src_b buffer[len] = 0; - packet_trace(buffer, len, 0); -+ if (options & PACKET_READ_REDACT_URL_PATH && -+ (url_path_start = find_url_path(buffer, &url_path_len))) { ++ if (options & PACKET_READ_REDACT_URI_PATH && ++ (uri_path_start = find_packfile_uri_path(buffer))) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); -+ strbuf_splice(&tracebuf, url_path_start - buffer, -+ url_path_len, redacted, strlen(redacted)); ++ strbuf_splice(&tracebuf, uri_path_start - buffer, ++ strlen(uri_path_start), redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { @@ pkt-line.h: void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) -+#define PACKET_READ_REDACT_URL_PATH (1u<<4) ++#define PACKET_READ_REDACT_URI_PATH (1u<<4) int packet_read(int fd, char *buffer, unsigned size, int options); /* 2: c7f0977cabd ! 2: 6912a690197 http-fetch: redact url on die() message @@ Commit message http-fetch: redact url on die() message http-fetch prints the URL after failing to fetch it. This can be - confusing to users (they cannot really do anything with it) but even - worse, they can share by accident a sensitive URL (e.g. with - credentials) while looking for help. + confusing to users (they cannot really do anything with it), and they + can share by accident a sensitive URL (e.g. with credentials) while + looking for help. Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This mimics the redaction of other sensitive information in git, like the Authorization header in HTTP. + Fix also capitalization of previous die() message (must start in + lowercase). + Signed-off-by: Ivan Frade <ifrade@google.com> ## http-fetch.c ## @@ http-fetch.c: static void fetch_single_packfile(struct object_id *packfile_hash, - curl_errorstr); + struct url_info url; + char *nurl = url_normalize(preq->url, &url); -+ if (!git_env_bool("GIT_TRACE_REDACT", 1) || !nurl) { -+ die("Unable to get pack file %s\n%s", preq->url, ++ if (!nurl || !git_env_bool("GIT_TRACE_REDACT", 1)) { ++ die("unable to get pack file '%s'\n%s", preq->url, + curl_errorstr); + } else { -+ char *schema = xstrndup(url.url, url.scheme_len); -+ char *host = xstrndup(&url.url[url.host_off], url.host_len); -+ die("failed to get '%s' url from '%s' " ++ die("failed to get '%.*s' url from '%.*s' " + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", -+ schema, host, curl_errorstr); ++ (int)url.scheme_len, url.url, ++ (int)url.host_len, &url.url[url.host_off], curl_errorstr); + } } } else { -- gitgitgadget ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v5 1/2] fetch-pack: redact packfile urls in traces 2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget @ 2021-10-28 22:51 ` Ivan Frade via GitGitGadget 2021-10-28 23:21 ` Junio C Hamano 2021-10-28 22:51 ` [PATCH v5 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-28 22:51 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- Documentation/git.txt | 5 +++-- fetch-pack.c | 4 ++++ pkt-line.c | 39 +++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 97 insertions(+), 3 deletions(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index d63c65e67d8..c91aa2737f0 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -832,8 +832,9 @@ for full details. `GIT_TRACE_REDACT`:: By default, when tracing is activated, Git redacts the values of - cookies, the "Authorization:" header, and the "Proxy-Authorization:" - header. Set this variable to `0` to prevent this redaction. + cookies, the "Authorization:" header, the "Proxy-Authorization:" + header and packfile URIs. Set this variable to `0` to prevent this + redaction. `GIT_LITERAL_PATHSPECS`:: Setting this variable to `1` will cause Git to treat all diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..62ea90541c5 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, receive_wanted_refs(&reader, sought, nr_sought); /* get the pack(s) */ + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader.options |= PACKET_READ_REDACT_URI_PATH; if (process_section_header(&reader, "packfile-uris", 1)) receive_packfile_uris(&reader, &packfile_uris); + reader.options &= ~PACKET_READ_REDACT_URI_PATH; + process_section_header(&reader, "packfile", 0); /* diff --git a/pkt-line.c b/pkt-line.c index 2dc8ac274bd..06013d2a54a 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } +static char *find_packfile_uri_path(const char *buffer) +{ + const char *URI_MARK = "://"; + char *path; + int len; + + /* First char is sideband mark */ + buffer += 1; + + len = strspn(buffer, "0123456789abcdefABCDEF"); + if (!(len == 40 || len == 64) || buffer[len] != ' ') + return NULL; /* required "<hash>SP" not seen */ + + path = strstr(buffer + len + 1, URI_MARK); + if (!path) + return NULL; + + path = strchr(path + strlen(URI_MARK), '/'); + if (!path || !*(path + 1)) + return NULL; + + /* position after '/' */ + return ++path; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int *pktlen, @@ -377,6 +402,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, { int len; char linelen[4]; + char *uri_path_start; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ -427,7 +453,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_URI_PATH && + (uri_path_start = find_packfile_uri_path(buffer))) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); + strbuf_splice(&tracebuf, uri_path_start - buffer, + strlen(uri_path_start), redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 467ae013573..6d2a63db238 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -87,6 +87,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_URI_PATH (1u<<4) int packet_read(int fd, char *buffer, unsigned size, int options); /* diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..f01af2f2ed3 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,57 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri path redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/<redacted>" log +' + +test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" log +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v5 1/2] fetch-pack: redact packfile urls in traces 2021-10-28 22:51 ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget @ 2021-10-28 23:21 ` Junio C Hamano 2021-10-29 18:42 ` Ivan Frade 0 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2021-10-28 23:21 UTC (permalink / raw) To: Ivan Frade via GitGitGadget Cc: git, Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/pkt-line.c b/pkt-line.c > index 2dc8ac274bd..06013d2a54a 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) > return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); > } > > +static char *find_packfile_uri_path(const char *buffer) > +{ > + const char *URI_MARK = "://"; > + char *path; > + int len; > + > + /* First char is sideband mark */ > + buffer += 1; > + > + len = strspn(buffer, "0123456789abcdefABCDEF"); > + if (!(len == 40 || len == 64) || buffer[len] != ' ') > + return NULL; /* required "<hash>SP" not seen */ People may have comments on hardcoded 40/64 here and offer a better way to write it ;-) > + path = strstr(buffer + len + 1, URI_MARK); > + if (!path) > + return NULL; > + > + path = strchr(path + strlen(URI_MARK), '/'); > + if (!path || !*(path + 1)) > + return NULL; > + > + /* position after '/' */ > + return ++path; > +} Other than that, the patch this round looks quite clean. Nicely done. Thanks, will queue. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v5 1/2] fetch-pack: redact packfile urls in traces 2021-10-28 23:21 ` Junio C Hamano @ 2021-10-29 18:42 ` Ivan Frade 2021-10-29 19:59 ` Junio C Hamano 0 siblings, 1 reply; 43+ messages in thread From: Ivan Frade @ 2021-10-29 18:42 UTC (permalink / raw) To: Junio C Hamano Cc: Ivan Frade via GitGitGadget, git, Ævar Arnfjörð Bjarmason, Eric Sunshine On Thu, Oct 28, 2021 at 4:21 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > + len = strspn(buffer, "0123456789abcdefABCDEF"); > > + if (!(len == 40 || len == 64) || buffer[len] != ' ') > > + return NULL; /* required "<hash>SP" not seen */ > > People may have comments on hardcoded 40/64 here and offer a better > way to write it ;-) Latest version uses the_hash_algo->hexsz: + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') + return NULL; /* required "<hash>SP" not seen */ Thanks! ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v5 1/2] fetch-pack: redact packfile urls in traces 2021-10-29 18:42 ` Ivan Frade @ 2021-10-29 19:59 ` Junio C Hamano 2021-11-08 22:43 ` Jonathan Tan 0 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2021-10-29 19:59 UTC (permalink / raw) To: Ivan Frade Cc: Ivan Frade via GitGitGadget, git, Ævar Arnfjörð Bjarmason, Eric Sunshine Ivan Frade <ifrade@google.com> writes: > On Thu, Oct 28, 2021 at 4:21 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: >> > >> > + len = strspn(buffer, "0123456789abcdefABCDEF"); >> > + if (!(len == 40 || len == 64) || buffer[len] != ' ') >> > + return NULL; /* required "<hash>SP" not seen */ >> >> People may have comments on hardcoded 40/64 here and offer a better >> way to write it ;-) > > Latest version uses the_hash_algo->hexsz: > > + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') > + return NULL; /* required "<hash>SP" not seen */ > > Thanks! OK. If the <hash> is given by this side (as opposed to "you started to talk to a remote, and it turns out that you are still talking SHA-1 but the other side talks SHA-256 and their <hash> size that is 64 does not match your 40" case), then checking against the_hash_algo->hexsz should be sufficient. The original suggestion was tried both because I didn't know where <hash> originates, and we would want to redact even in such a hash type mismatch case. Thanks. Will take a look at the updated one. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v5 1/2] fetch-pack: redact packfile urls in traces 2021-10-29 19:59 ` Junio C Hamano @ 2021-11-08 22:43 ` Jonathan Tan 0 siblings, 0 replies; 43+ messages in thread From: Jonathan Tan @ 2021-11-08 22:43 UTC (permalink / raw) To: gitster; +Cc: ifrade, gitgitgadget, git, avarab, sunshine, Jonathan Tan > OK. If the <hash> is given by this side (as opposed to "you started > to talk to a remote, and it turns out that you are still talking > SHA-1 but the other side talks SHA-256 and their <hash> size that is > 64 does not match your 40" case), then checking against > the_hash_algo->hexsz should be sufficient. The original suggestion > was tried both because I didn't know where <hash> originates, and we > would want to redact even in such a hash type mismatch case. > > Thanks. Will take a look at the updated one. This is when reading from the remote, so <hash> comes from the other side. I don't think that the remote sending the wrong hash size (and then needing to redact) is a big concern, but there is definitely no harm in checking for both (and commenting that these are the SHA-1 and SHA-256 hash sizes). ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v5 2/2] http-fetch: redact url on die() message 2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-28 22:51 ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget @ 2021-10-28 22:51 ` Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 0 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-28 22:51 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> http-fetch prints the URL after failing to fetch it. This can be confusing to users (they cannot really do anything with it), and they can share by accident a sensitive URL (e.g. with credentials) while looking for help. Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This mimics the redaction of other sensitive information in git, like the Authorization header in HTTP. Fix also capitalization of previous die() message (must start in lowercase). Signed-off-by: Ivan Frade <ifrade@google.com> --- http-fetch.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/http-fetch.c b/http-fetch.c index fa642462a9e..c7c7d391ac5 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -4,6 +4,7 @@ #include "http.h" #include "walker.h" #include "strvec.h" +#include "urlmatch.h" static const char http_fetch_usage[] = "git http-fetch " "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url"; @@ -63,8 +64,17 @@ static void fetch_single_packfile(struct object_id *packfile_hash, if (start_active_slot(preq->slot)) { run_active_slot(preq->slot); if (results.curl_result != CURLE_OK) { - die("Unable to get pack file %s\n%s", preq->url, - curl_errorstr); + struct url_info url; + char *nurl = url_normalize(preq->url, &url); + if (!nurl || !git_env_bool("GIT_TRACE_REDACT", 1)) { + die("unable to get pack file '%s'\n%s", preq->url, + curl_errorstr); + } else { + die("failed to get '%.*s' url from '%.*s' " + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", + (int)url.scheme_len, url.url, + (int)url.host_len, &url.url[url.host_off], curl_errorstr); + } } } else { die("Unable to start request"); -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v6 0/2] fetch-pack: redact packfile urls in traces 2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-28 22:51 ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget 2021-10-28 22:51 ` [PATCH v5 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget @ 2021-10-29 18:42 ` Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget ` (2 more replies) 2 siblings, 3 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-29 18:42 UTC (permalink / raw) To: git; +Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade Changes since v5: * Use hexsz instead of hardcoded hash sizes Changes since v4: * Use "uri" instead of "url" * Look specifically for a line with packfile-uri format (instead of for a URL in general) * Limit the redacting to the packfile-uri section in do_fetch_pack_v2 * Use "%.*s" instead of duplicating parts of the string to print Changes since v3: * Enable redacting URLs for all sections * Redact only URL path (it was until the end of line) * Redact URL in die() with more friendly message * Update doc to mention that packfile URIs are also redacted. Changes since v2: * Redact only the path of the URL * Test are now strict, validating the exact line expected in the log Changes since v1: * Removed non-POSIX flags in tests * More accurate regex for the non-encrypted packfile line * Dropped documentation change * Dropped redacting the die message in http-fetch Ivan Frade (2): fetch-pack: redact packfile urls in traces http-fetch: redact url on die() message Documentation/git.txt | 5 +++-- fetch-pack.c | 4 ++++ http-fetch.c | 14 ++++++++++-- pkt-line.c | 39 +++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 5 deletions(-) base-commit: e9e5ba39a78c8f5057262d49e261b42a8660d5b9 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v6 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v6 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 Range-diff vs v5: 1: c95b3cafcd6 ! 1: a6098f98946 fetch-pack: redact packfile urls in traces @@ pkt-line.c: int packet_length(const char lenbuf_hex[4]) + buffer += 1; + + len = strspn(buffer, "0123456789abcdefABCDEF"); -+ if (!(len == 40 || len == 64) || buffer[len] != ' ') ++ if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') + return NULL; /* required "<hash>SP" not seen */ + + path = strstr(buffer + len + 1, URI_MARK); 2: 6912a690197 = 2: 38859ae7b7d http-fetch: redact url on die() message -- gitgitgadget ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v6 1/2] fetch-pack: redact packfile urls in traces 2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget @ 2021-10-29 18:42 ` Ivan Frade via GitGitGadget 2021-11-08 23:01 ` Jonathan Tan 2021-10-29 18:42 ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-29 18:42 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- Documentation/git.txt | 5 +++-- fetch-pack.c | 4 ++++ pkt-line.c | 39 +++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 97 insertions(+), 3 deletions(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index d63c65e67d8..c91aa2737f0 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -832,8 +832,9 @@ for full details. `GIT_TRACE_REDACT`:: By default, when tracing is activated, Git redacts the values of - cookies, the "Authorization:" header, and the "Proxy-Authorization:" - header. Set this variable to `0` to prevent this redaction. + cookies, the "Authorization:" header, the "Proxy-Authorization:" + header and packfile URIs. Set this variable to `0` to prevent this + redaction. `GIT_LITERAL_PATHSPECS`:: Setting this variable to `1` will cause Git to treat all diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..62ea90541c5 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, receive_wanted_refs(&reader, sought, nr_sought); /* get the pack(s) */ + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader.options |= PACKET_READ_REDACT_URI_PATH; if (process_section_header(&reader, "packfile-uris", 1)) receive_packfile_uris(&reader, &packfile_uris); + reader.options &= ~PACKET_READ_REDACT_URI_PATH; + process_section_header(&reader, "packfile", 0); /* diff --git a/pkt-line.c b/pkt-line.c index 2dc8ac274bd..5a69ddc2e77 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } +static char *find_packfile_uri_path(const char *buffer) +{ + const char *URI_MARK = "://"; + char *path; + int len; + + /* First char is sideband mark */ + buffer += 1; + + len = strspn(buffer, "0123456789abcdefABCDEF"); + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') + return NULL; /* required "<hash>SP" not seen */ + + path = strstr(buffer + len + 1, URI_MARK); + if (!path) + return NULL; + + path = strchr(path + strlen(URI_MARK), '/'); + if (!path || !*(path + 1)) + return NULL; + + /* position after '/' */ + return ++path; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int *pktlen, @@ -377,6 +402,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, { int len; char linelen[4]; + char *uri_path_start; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ -427,7 +453,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_URI_PATH && + (uri_path_start = find_packfile_uri_path(buffer))) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); + strbuf_splice(&tracebuf, uri_path_start - buffer, + strlen(uri_path_start), redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 467ae013573..6d2a63db238 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -87,6 +87,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_URI_PATH (1u<<4) int packet_read(int fd, char *buffer, unsigned size, int options); /* diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..f01af2f2ed3 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,57 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri path redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/<redacted>" log +' + +test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" log +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces 2021-10-29 18:42 ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget @ 2021-11-08 23:01 ` Jonathan Tan 2021-11-09 1:36 ` Ævar Arnfjörð Bjarmason 2021-11-10 21:18 ` Ivan Frade 0 siblings, 2 replies; 43+ messages in thread From: Jonathan Tan @ 2021-11-08 23:01 UTC (permalink / raw) To: gitgitgadget; +Cc: git, avarab, sunshine, ifrade, Jonathan Tan > diff --git a/fetch-pack.c b/fetch-pack.c > index a9604f35a3e..62ea90541c5 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, > receive_wanted_refs(&reader, sought, nr_sought); > > /* get the pack(s) */ > + if (git_env_bool("GIT_TRACE_REDACT", 1)) > + reader.options |= PACKET_READ_REDACT_URI_PATH; > if (process_section_header(&reader, "packfile-uris", 1)) > receive_packfile_uris(&reader, &packfile_uris); > + reader.options &= ~PACKET_READ_REDACT_URI_PATH; Probably worth commenting why you're resetting the flag (avoid the relatively expensive URI check when we don't need it). > diff --git a/pkt-line.c b/pkt-line.c > index 2dc8ac274bd..5a69ddc2e77 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) > return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); > } > > +static char *find_packfile_uri_path(const char *buffer) > +{ > + const char *URI_MARK = "://"; > + char *path; > + int len; > + > + /* First char is sideband mark */ > + buffer += 1; > + > + len = strspn(buffer, "0123456789abcdefABCDEF"); > + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') > + return NULL; /* required "<hash>SP" not seen */ Optional: As I said in my reply (just sent out), checking for both SHA-1 and SHA-256 lengths is reasonable too. [1] https://lore.kernel.org/git/20211108224335.569596-1-jonathantanmy@google.com/ > diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh > index d527cf6c49f..f01af2f2ed3 100755 > --- a/t/t5702-protocol-v2.sh > +++ b/t/t5702-protocol-v2.sh > @@ -1107,6 +1107,57 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul > test_i18ngrep "disallowed submodule name" err > ' > > +test_expect_success 'packfile-uri path redacted in trace' ' > + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && > + rm -rf "$P" http_child log && > + > + git init "$P" && > + git -C "$P" config "uploadpack.allowsidebandall" "true" && > + > + echo my-blob >"$P/my-blob" && > + git -C "$P" add my-blob && > + git -C "$P" commit -m x && > + > + git -C "$P" hash-object my-blob >objh && > + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && > + git -C "$P" config --add \ > + "uploadpack.blobpackfileuri" \ > + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && > + > + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ No need for GIT_TRACE=1 since you're not checking stdout. Also I don't think GIT_TEST_SIDEBAND_ALL=1 is needed - we should check that it works even without a test variable (and I've checked and it seems to work). [snip] > +test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' > + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && > + rm -rf "$P" http_child log && > + > + git init "$P" && > + git -C "$P" config "uploadpack.allowsidebandall" "true" && > + > + echo my-blob >"$P/my-blob" && > + git -C "$P" add my-blob && > + git -C "$P" commit -m x && > + > + git -C "$P" hash-object my-blob >objh && > + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && > + git -C "$P" config --add \ > + "uploadpack.blobpackfileuri" \ > + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && > + > + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ Same comment here. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces 2021-11-08 23:01 ` Jonathan Tan @ 2021-11-09 1:36 ` Ævar Arnfjörð Bjarmason 2021-11-10 23:44 ` Ivan Frade 2021-11-10 21:18 ` Ivan Frade 1 sibling, 1 reply; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-09 1:36 UTC (permalink / raw) To: Jonathan Tan; +Cc: gitgitgadget, git, sunshine, ifrade On Mon, Nov 08 2021, Jonathan Tan wrote: >> diff --git a/fetch-pack.c b/fetch-pack.c >> index a9604f35a3e..62ea90541c5 100644 >> --- a/fetch-pack.c >> +++ b/fetch-pack.c >> @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, >> receive_wanted_refs(&reader, sought, nr_sought); >> >> /* get the pack(s) */ >> + if (git_env_bool("GIT_TRACE_REDACT", 1)) >> + reader.options |= PACKET_READ_REDACT_URI_PATH; >> if (process_section_header(&reader, "packfile-uris", 1)) >> receive_packfile_uris(&reader, &packfile_uris); >> + reader.options &= ~PACKET_READ_REDACT_URI_PATH; > > Probably worth commenting why you're resetting the flag (avoid the > relatively expensive URI check when we don't need it). ...yeah... >> diff --git a/pkt-line.c b/pkt-line.c >> index 2dc8ac274bd..5a69ddc2e77 100644 >> --- a/pkt-line.c >> +++ b/pkt-line.c >> @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) >> return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); >> } >> >> +static char *find_packfile_uri_path(const char *buffer) >> +{ >> + const char *URI_MARK = "://"; >> + char *path; >> + int len; >> + >> + /* First char is sideband mark */ >> + buffer += 1; >> + >> + len = strspn(buffer, "0123456789abcdefABCDEF"); >> + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') >> + return NULL; /* required "<hash>SP" not seen */ > > Optional: As I said in my reply (just sent out), checking for both SHA-1 > and SHA-256 lengths is reasonable too. > > [1] https://lore.kernel.org/git/20211108224335.569596-1-jonathantanmy@google.com/ Correct me if I'm wrong, but I find it really strange that we're trying to parse things in pkt-line.c like this. We'll only get these from a client in code that's invoked in fetch-pack.c, specifically the process_section_header() quoted above, no? From there we'll call packet_reader_read(), which will call packet_read_with_status(), and from there we'll call packet_trace(). Then right after all this happens we've got a loop that parses out these packfile URIs, including this being-done-first-here parsing of the hex value just for logging, except in pkt-line.c we've lost the information about what hash algorithm length we should be using, which fetch-pack.c of course needs to know. Why can't that process_section_header() in fetch-pack.c just be made to call some pkt-line.c API saying "don't log yet", i.e. something like this pseudocode: diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..31f5ee7fc6b 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1518,14 +1518,18 @@ static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { + struct string_list log = STRING_LIST_INIT_DUP; + process_section_header(reader, "packfile-uris", 0); - while (packet_reader_read(reader) == PACKET_READ_NORMAL) { + while (packet_reader_read_log_to(reader, &log) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || reader->line[the_hash_algo->hexsz] != ' ') die("expected '<hash> <uri>', got: %s\n", reader->line); + /* move the parsing of the URLs here */ string_list_append(uris, reader->line); } + log_stuff(&log); if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } I.e. we'll eventually call trace_strbuf() in pkt-line.c, and we know that we're doing these packfile-uris, and we know that we're just about to parse them. Let's just: 1. Start reading the section 2. Turn off tracing 3. Parse the URIs as we go 3. When done (or on the fly), scrub URIs, log any backlog suppressed trace, and turn on tracing again Instead of: 1. Set a flag to scrub stuff 2. Because of the disconnect between fetch-pack.c and pkt-line.c, effectively implement a new parser for data we're already going to be parsing some microseconds later during the course of the request. That "turn off the trace" could be passing down a string_list/strbuf, or even doing the same via a nev member in "struct packet_reader", both would be simpler than needing to re-do the parse. Probably simplest is just: struct string_list log = STRING_LIST_INIT_DUP; reader.deferred_trace = &log; /* packet_reader_read() etc. code, unchanged from now */ /* parse URIs (just move the existing code around a bit) */ packet_reader.deferred_trace = NULL; for_each...(item, &log) trace_strbuf(...); ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces 2021-11-09 1:36 ` Ævar Arnfjörð Bjarmason @ 2021-11-10 23:44 ` Ivan Frade 2021-11-11 0:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 43+ messages in thread From: Ivan Frade @ 2021-11-10 23:44 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Jonathan Tan, gitgitgadget, git, sunshine On Mon, Nov 8, 2021 at 5:53 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > ... >... Let's just: > > 1. Start reading the section > 2. Turn off tracing > 3. Parse the URIs as we go > 3. When done (or on the fly), scrub URIs, log any backlog suppressed trace, and turn on tracing again This is a more generic redacting mechanism, but I understood that there is no need for it. Previous comments went in the direction of removing generality (e.g. not looking for a URI anywhere in the packet, but specifically for the packfile line format) and now this patch is very specific to redact packfile-uri lines in the protocol. > Instead of: > > 1. Set a flag to scrub stuff > 2. Because of the disconnect between fetch-pack.c and pkt-line.c, > effectively implement a new parser for data we're already going to be > parsing some microseconds later during the course of the request. pkt-line is only looking for the "<n-hex-chars>SP" shape. True that it encodes some protocol knowledge, but it is hardly a new parser. > That "turn off the trace" could be passing down a string_list/strbuf, or > even doing the same via a nev member in "struct packet_reader", both > would be simpler than needing to re-do the parse. Saving the lines and delaying the tracing could also produce weird outputs, no? e.g. 3 lines received, the second doesn't validate, the program aborts and the trace doesn't show any of the lines that caused the problem. Or we would need to iterate in parallel through lines and saved-log-lines assuming they match 1:1. Nothing unsolvable, but I am not sure it is worthy the effort now. Thanks, ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces 2021-11-10 23:44 ` Ivan Frade @ 2021-11-11 0:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 43+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-11 0:01 UTC (permalink / raw) To: Ivan Frade; +Cc: Jonathan Tan, gitgitgadget, git, sunshine On Wed, Nov 10 2021, Ivan Frade wrote: > On Mon, Nov 8, 2021 at 5:53 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: >> > ... >>... Let's just: >> >> 1. Start reading the section >> 2. Turn off tracing >> 3. Parse the URIs as we go >> 3. When done (or on the fly), scrub URIs, log any backlog suppressed trace, and turn on tracing again > > This is a more generic redacting mechanism, but I understood that > there is no need for it. Previous comments went in the direction of > removing generality (e.g. not looking for a URI anywhere in the > packet, but specifically for the packfile line format) and now this > patch is very specific to redact packfile-uri lines in the protocol. It's less generic, because it would live in the loop that consumes the lines. >> Instead of: >> >> 1. Set a flag to scrub stuff >> 2. Because of the disconnect between fetch-pack.c and pkt-line.c, >> effectively implement a new parser for data we're already going to be >> parsing some microseconds later during the course of the request. > > pkt-line is only looking for the "<n-hex-chars>SP" shape. True that it > encodes some protocol knowledge, but it is hardly a new parser. Yeah, but why have find_packfile_uri_path() at all instead of just moving the parsing code around? We've already got the code that parses these lines, it's just a few lines removed from the code you're adding... >> That "turn off the trace" could be passing down a string_list/strbuf, or >> even doing the same via a nev member in "struct packet_reader", both >> would be simpler than needing to re-do the parse. > > Saving the lines and delaying the tracing could also produce weird > outputs, no? e.g. 3 lines received, the second doesn't validate, the > program aborts and the trace doesn't show any of the lines that caused > the problem. Or we would need to iterate in parallel through lines and > saved-log-lines assuming they match 1:1. Nothing unsolvable, but I am > not sure it is worthy the effort now. It would only be weird if you do : download_later = while (consume lines) download_later += buffer_lines; log lines; I'm suggesting: download_later = while (consume lines) raw, to_log = parse line log line(to_log) download_later += raw Sure, you'll need to do something in the case where the line doesn't validate, should you redact it still, or log it as is? Anyway, that's also a caveat you've got now. That's not iterating in parallel, having one for-loop instead of two. I see now that that approach would also solve at least one bug/misfeature in the packfile-uri handling, i.e.: for (i = 0; i < packfile_uris.nr; i++) { [...] start_command(...) [... to download the URI ...] [...] die("fetch-pack: pack downloaded from %s does not match expected hash %.*s", } I.e. we've already received all the URIs, but then do validation on them one at a time, so we might only notice that the server has sent us bad data for the Nth URI after first downloading the first N-1 URIs. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces 2021-11-08 23:01 ` Jonathan Tan 2021-11-09 1:36 ` Ævar Arnfjörð Bjarmason @ 2021-11-10 21:18 ` Ivan Frade 1 sibling, 0 replies; 43+ messages in thread From: Ivan Frade @ 2021-11-10 21:18 UTC (permalink / raw) To: Jonathan Tan; +Cc: gitgitgadget, git, avarab, sunshine On Mon, Nov 8, 2021 at 3:01 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > > + reader.options &= ~PACKET_READ_REDACT_URI_PATH; > > Probably worth commenting why you're resetting the flag (avoid the > relatively expensive URI check when we don't need it). Done > > > diff --git a/pkt-line.c b/pkt-line.c ... > > + len = strspn(buffer, "0123456789abcdefABCDEF"); > > + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') > > + return NULL; /* required "<hash>SP" not seen */ > > Optional: As I said in my reply (just sent out), checking for both SHA-1 > and SHA-256 lengths is reasonable too. Done (with a comment indicating they are the hash sizes of SHA1 and SHA256) > > + GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ > > No need for GIT_TRACE=1 since you're not checking stdout. Also I don't > think GIT_TEST_SIDEBAND_ALL=1 is needed - we should check that it works > even without a test variable (and I've checked and it seems to work). Done in both tests. Thanks, ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v6 2/2] http-fetch: redact url on die() message 2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget @ 2021-10-29 18:42 ` Ivan Frade via GitGitGadget 2021-11-08 23:06 ` Jonathan Tan 2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2 siblings, 1 reply; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-10-29 18:42 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> http-fetch prints the URL after failing to fetch it. This can be confusing to users (they cannot really do anything with it), and they can share by accident a sensitive URL (e.g. with credentials) while looking for help. Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This mimics the redaction of other sensitive information in git, like the Authorization header in HTTP. Fix also capitalization of previous die() message (must start in lowercase). Signed-off-by: Ivan Frade <ifrade@google.com> --- http-fetch.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/http-fetch.c b/http-fetch.c index fa642462a9e..c7c7d391ac5 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -4,6 +4,7 @@ #include "http.h" #include "walker.h" #include "strvec.h" +#include "urlmatch.h" static const char http_fetch_usage[] = "git http-fetch " "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url"; @@ -63,8 +64,17 @@ static void fetch_single_packfile(struct object_id *packfile_hash, if (start_active_slot(preq->slot)) { run_active_slot(preq->slot); if (results.curl_result != CURLE_OK) { - die("Unable to get pack file %s\n%s", preq->url, - curl_errorstr); + struct url_info url; + char *nurl = url_normalize(preq->url, &url); + if (!nurl || !git_env_bool("GIT_TRACE_REDACT", 1)) { + die("unable to get pack file '%s'\n%s", preq->url, + curl_errorstr); + } else { + die("failed to get '%.*s' url from '%.*s' " + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", + (int)url.scheme_len, url.url, + (int)url.host_len, &url.url[url.host_off], curl_errorstr); + } } } else { die("Unable to start request"); -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v6 2/2] http-fetch: redact url on die() message 2021-10-29 18:42 ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget @ 2021-11-08 23:06 ` Jonathan Tan 0 siblings, 0 replies; 43+ messages in thread From: Jonathan Tan @ 2021-11-08 23:06 UTC (permalink / raw) To: gitgitgadget; +Cc: git, avarab, sunshine, ifrade, Jonathan Tan > @@ -63,8 +64,17 @@ static void fetch_single_packfile(struct object_id *packfile_hash, > if (start_active_slot(preq->slot)) { > run_active_slot(preq->slot); > if (results.curl_result != CURLE_OK) { > - die("Unable to get pack file %s\n%s", preq->url, > - curl_errorstr); > + struct url_info url; > + char *nurl = url_normalize(preq->url, &url); > + if (!nurl || !git_env_bool("GIT_TRACE_REDACT", 1)) { > + die("unable to get pack file '%s'\n%s", preq->url, > + curl_errorstr); > + } else { > + die("failed to get '%.*s' url from '%.*s' " > + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", > + (int)url.scheme_len, url.url, > + (int)url.host_len, &url.url[url.host_off], curl_errorstr); > + } I was confused why nurl was set but never used in "else", but I see that it's because url_normalize() also sets that value in the urlinfo struct. This patch looks good (and patch 1 too, with my suggested changes). ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v7 0/2] fetch-pack: redact packfile urls in traces 2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget @ 2021-11-10 23:51 ` Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget ` (2 more replies) 2 siblings, 3 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-11-10 23:51 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Jonathan Tan, Ivan Frade Changes since v6: * Use specific hash sizes instead of hexsz * Remove unnecessary env vars in tests * Added comment on bit toggle Changes since v5: * Use hexsz instead of hardcoded hash sizes Changes since v4: * Use "uri" instead of "url" * Look specifically for a line with packfile-uri format (instead of for a URL in general) * Limit the redacting to the packfile-uri section in do_fetch_pack_v2 * Use "%.*s" instead of duplicating parts of the string to print Changes since v3: * Enable redacting URLs for all sections * Redact only URL path (it was until the end of line) * Redact URL in die() with more friendly message * Update doc to mention that packfile URIs are also redacted. Changes since v2: * Redact only the path of the URL * Test are now strict, validating the exact line expected in the log Changes since v1: * Removed non-POSIX flags in tests * More accurate regex for the non-encrypted packfile line * Dropped documentation change * Dropped redacting the die message in http-fetch Ivan Frade (2): fetch-pack: redact packfile urls in traces http-fetch: redact url on die() message Documentation/git.txt | 5 +++-- fetch-pack.c | 5 +++++ http-fetch.c | 14 ++++++++++-- pkt-line.c | 40 ++++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 111 insertions(+), 5 deletions(-) base-commit: 88d915a634b449147855041d44875322de2b286d Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1052%2Fifradeo%2Fredact-packfile-uri-v7 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1052/ifradeo/redact-packfile-uri-v7 Pull-Request: https://github.com/gitgitgadget/git/pull/1052 Range-diff vs v6: 1: a6098f98946 ! 1: bbfdc346ede fetch-pack: redact packfile urls in traces @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, + reader.options |= PACKET_READ_REDACT_URI_PATH; if (process_section_header(&reader, "packfile-uris", 1)) receive_packfile_uris(&reader, &packfile_uris); ++ /* We don't expect more URIs. Reset to avoid expensive URI check. */ + reader.options &= ~PACKET_READ_REDACT_URI_PATH; + process_section_header(&reader, "packfile", 0); @@ pkt-line.c: int packet_length(const char lenbuf_hex[4]) + buffer += 1; + + len = strspn(buffer, "0123456789abcdefABCDEF"); -+ if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') ++ /* size of SHA1 and SHA256 hash */ ++ if (!(len == 40 || len == 64) || buffer[len] != ' ') + return NULL; /* required "<hash>SP" not seen */ + + path = strstr(buffer + len + 1, URI_MARK); @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + -+ GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ ++ GIT_TRACE_PACKET="$(pwd)/log" \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + -+ GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \ ++ GIT_TRACE_PACKET="$(pwd)/log" \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ 2: 38859ae7b7d = 2: 3b210735bc8 http-fetch: redact url on die() message -- gitgitgadget ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v7 1/2] fetch-pack: redact packfile urls in traces 2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget @ 2021-11-10 23:51 ` Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-11-12 4:43 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Junio C Hamano 2 siblings, 0 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-11-10 23:51 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Jonathan Tan, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> --- Documentation/git.txt | 5 +++-- fetch-pack.c | 5 +++++ pkt-line.c | 40 ++++++++++++++++++++++++++++++++- pkt-line.h | 1 + t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 99 insertions(+), 3 deletions(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index 281c5f8caef..13f83a2a3a1 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -832,8 +832,9 @@ for full details. `GIT_TRACE_REDACT`:: By default, when tracing is activated, Git redacts the values of - cookies, the "Authorization:" header, and the "Proxy-Authorization:" - header. Set this variable to `0` to prevent this redaction. + cookies, the "Authorization:" header, the "Proxy-Authorization:" + header and packfile URIs. Set this variable to `0` to prevent this + redaction. `GIT_LITERAL_PATHSPECS`:: Setting this variable to `1` will cause Git to treat all diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..8b8c75f33aa 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1653,8 +1653,13 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, receive_wanted_refs(&reader, sought, nr_sought); /* get the pack(s) */ + if (git_env_bool("GIT_TRACE_REDACT", 1)) + reader.options |= PACKET_READ_REDACT_URI_PATH; if (process_section_header(&reader, "packfile-uris", 1)) receive_packfile_uris(&reader, &packfile_uris); + /* We don't expect more URIs. Reset to avoid expensive URI check. */ + reader.options &= ~PACKET_READ_REDACT_URI_PATH; + process_section_header(&reader, "packfile", 0); /* diff --git a/pkt-line.c b/pkt-line.c index 2dc8ac274bd..8e43c2def4c 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -370,6 +370,32 @@ int packet_length(const char lenbuf_hex[4]) return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); } +static char *find_packfile_uri_path(const char *buffer) +{ + const char *URI_MARK = "://"; + char *path; + int len; + + /* First char is sideband mark */ + buffer += 1; + + len = strspn(buffer, "0123456789abcdefABCDEF"); + /* size of SHA1 and SHA256 hash */ + if (!(len == 40 || len == 64) || buffer[len] != ' ') + return NULL; /* required "<hash>SP" not seen */ + + path = strstr(buffer + len + 1, URI_MARK); + if (!path) + return NULL; + + path = strchr(path + strlen(URI_MARK), '/'); + if (!path || !*(path + 1)) + return NULL; + + /* position after '/' */ + return ++path; +} + enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len, char *buffer, unsigned size, int *pktlen, @@ -377,6 +403,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, { int len; char linelen[4]; + char *uri_path_start; if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { *pktlen = -1; @@ -427,7 +454,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, len--; buffer[len] = 0; - packet_trace(buffer, len, 0); + if (options & PACKET_READ_REDACT_URI_PATH && + (uri_path_start = find_packfile_uri_path(buffer))) { + const char *redacted = "<redacted>"; + struct strbuf tracebuf = STRBUF_INIT; + strbuf_insert(&tracebuf, 0, buffer, len); + strbuf_splice(&tracebuf, uri_path_start - buffer, + strlen(uri_path_start), redacted, strlen(redacted)); + packet_trace(tracebuf.buf, tracebuf.len, 0); + strbuf_release(&tracebuf); + } else { + packet_trace(buffer, len, 0); + } if ((options & PACKET_READ_DIE_ON_ERR_PACKET) && starts_with(buffer, "ERR ")) diff --git a/pkt-line.h b/pkt-line.h index 467ae013573..6d2a63db238 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -87,6 +87,7 @@ void packet_fflush(FILE *f); #define PACKET_READ_CHOMP_NEWLINE (1u<<1) #define PACKET_READ_DIE_ON_ERR_PACKET (1u<<2) #define PACKET_READ_GENTLE_ON_READ_ERROR (1u<<3) +#define PACKET_READ_REDACT_URI_PATH (1u<<4) int packet_read(int fd, char *buffer, unsigned size, int options); /* diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index d527cf6c49f..78f85b0714a 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -1107,6 +1107,57 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodul test_i18ngrep "disallowed submodule name" err ' +test_expect_success 'packfile-uri path redacted in trace' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE_PACKET="$(pwd)/log" \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/<redacted>" log +' + +test_expect_success 'packfile-uri path not redacted in trace when GIT_TRACE_REDACT=0' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + git -C "$P" hash-object my-blob >objh && + git -C "$P" pack-objects "$HTTPD_DOCUMENT_ROOT_PATH/mypack" <objh >packh && + git -C "$P" config --add \ + "uploadpack.blobpackfileuri" \ + "$(cat objh) $(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" && + + GIT_TRACE_PACKET="$(pwd)/log" \ + GIT_TRACE_REDACT=0 \ + git -c protocol.version=2 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + grep -F "clone< \\1$(cat packh) $HTTPD_URL/dumb/mypack-$(cat packh).pack" log +' + test_expect_success 'http:// --negotiate-only' ' SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && URI="$HTTPD_URL/smart/server" && -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 2/2] http-fetch: redact url on die() message 2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget @ 2021-11-10 23:51 ` Ivan Frade via GitGitGadget 2021-11-12 4:43 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Junio C Hamano 2 siblings, 0 replies; 43+ messages in thread From: Ivan Frade via GitGitGadget @ 2021-11-10 23:51 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Eric Sunshine, Jonathan Tan, Ivan Frade, Ivan Frade From: Ivan Frade <ifrade@google.com> http-fetch prints the URL after failing to fetch it. This can be confusing to users (they cannot really do anything with it), and they can share by accident a sensitive URL (e.g. with credentials) while looking for help. Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This mimics the redaction of other sensitive information in git, like the Authorization header in HTTP. Fix also capitalization of previous die() message (must start in lowercase). Signed-off-by: Ivan Frade <ifrade@google.com> --- http-fetch.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/http-fetch.c b/http-fetch.c index fa642462a9e..c7c7d391ac5 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -4,6 +4,7 @@ #include "http.h" #include "walker.h" #include "strvec.h" +#include "urlmatch.h" static const char http_fetch_usage[] = "git http-fetch " "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url"; @@ -63,8 +64,17 @@ static void fetch_single_packfile(struct object_id *packfile_hash, if (start_active_slot(preq->slot)) { run_active_slot(preq->slot); if (results.curl_result != CURLE_OK) { - die("Unable to get pack file %s\n%s", preq->url, - curl_errorstr); + struct url_info url; + char *nurl = url_normalize(preq->url, &url); + if (!nurl || !git_env_bool("GIT_TRACE_REDACT", 1)) { + die("unable to get pack file '%s'\n%s", preq->url, + curl_errorstr); + } else { + die("failed to get '%.*s' url from '%.*s' " + "(full URL redacted due to GIT_TRACE_REDACT setting)\n%s", + (int)url.scheme_len, url.url, + (int)url.host_len, &url.url[url.host_off], curl_errorstr); + } } } else { die("Unable to start request"); -- gitgitgadget ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v7 0/2] fetch-pack: redact packfile urls in traces 2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget @ 2021-11-12 4:43 ` Junio C Hamano 2 siblings, 0 replies; 43+ messages in thread From: Junio C Hamano @ 2021-11-12 4:43 UTC (permalink / raw) To: Ivan Frade via GitGitGadget Cc: git, Ævar Arnfjörð Bjarmason, Eric Sunshine, Jonathan Tan, Ivan Frade "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes: > Changes since v6: > > * Use specific hash sizes instead of hexsz > * Remove unnecessary env vars in tests > * Added comment on bit toggle > Ivan Frade (2): > fetch-pack: redact packfile urls in traces > http-fetch: redact url on die() message Thanks, will queue. ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2021-11-12 4:43 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget 2021-10-08 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-08 23:15 ` Ivan Frade 2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget 2021-10-08 19:43 ` Ævar Arnfjörð Bjarmason 2021-10-09 2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-11 20:39 ` Junio C Hamano 2021-10-26 19:32 ` Ivan Frade 2021-10-19 22:57 ` [PATCH v3] " Ivan Frade via GitGitGadget 2021-10-20 11:41 ` Ævar Arnfjörð Bjarmason 2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget 2021-10-26 22:49 ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget 2021-10-28 1:01 ` Junio C Hamano 2021-10-28 22:15 ` Ivan Frade 2021-10-28 22:46 ` Junio C Hamano 2021-10-26 22:49 ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason 2021-10-28 17:25 ` Eric Sunshine 2021-10-28 22:44 ` Ivan Frade 2021-10-28 22:41 ` Ivan Frade 2021-10-29 23:18 ` Junio C Hamano 2021-11-09 1:54 ` Ævar Arnfjörð Bjarmason 2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-28 22:51 ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget 2021-10-28 23:21 ` Junio C Hamano 2021-10-29 18:42 ` Ivan Frade 2021-10-29 19:59 ` Junio C Hamano 2021-11-08 22:43 ` Jonathan Tan 2021-10-28 22:51 ` [PATCH v5 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-10-29 18:42 ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget 2021-11-08 23:01 ` Jonathan Tan 2021-11-09 1:36 ` Ævar Arnfjörð Bjarmason 2021-11-10 23:44 ` Ivan Frade 2021-11-11 0:01 ` Ævar Arnfjörð Bjarmason 2021-11-10 21:18 ` Ivan Frade 2021-10-29 18:42 ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-11-08 23:06 ` Jonathan Tan 2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget 2021-11-10 23:51 ` [PATCH v7 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget 2021-11-12 4:43 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.