All of lore.kernel.org
 help / color / mirror / Atom feed
From: Augie Fackler <augie@google.com>
To: Johannes Sixt <j6t@kdbg.org>
Cc: git@vger.kernel.org, Stefan Beller <sbeller@google.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2] fetch-pack: optionally save packs to disk
Date: Fri, 12 Jun 2015 12:54:24 -0400	[thread overview]
Message-ID: <CAHcr6HZsevG+RyRb3+hghgeRhMY7kmcRZBwt1D8JWn0RfSuHSQ@mail.gmail.com> (raw)
In-Reply-To: <557A7ABA.2000404@kdbg.org>

On Fri, Jun 12, 2015 at 2:22 AM, Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 11.06.2015 um 20:59 schrieb Augie Fackler:
>>
>> When developing server software, it's often helpful to save a
>> potentially-bogus pack for later analysis. This makes that trivial,
>> instead of painful.
>
>
> When you develop server software, shouldn't you test drive the server via
the bare metal protocol anyway? That *is* painful, but unavoidable because
you must harden the server against any garbage that a potentially malicous
client could throw at it. Restricting yourself to a well-behaved client
such as fetch-pack is only half the deal.


We do that too, but sometimes I've encountered an edge case that's
trivially reproduced from an existing repo, and going through the work to
manually drive the server is a monumental pain in the butt, and all I
*really* need is to see the bytes sent from the server to the client. If it
weren't for SSL-everywhere, I'd probably just do this with wireshark, but
that's not the world I live in.

>
>
> That said, I do think that fetch-pack could learn a mode that makes it
easier to debug the normal behavior of a server (if such a mode is missing
currently).
>
> What is the problem with the current fetch-pack implementation? Does it
remove a bogus packfile after download? Does it abort during download when
it detects a broken packfile? Does --keep not do what you need?


fetch-pack doesn't store the pack anywhere - it's sending it to index-pack
(or unpack-objects) using --stdin, which means that the raw bytes from the
server currently are never materialized anywhere on disk. Having index-pack
do this is too late, because it's doing things like rewriting the pack
header in a potentially new format.

(Junio also covered this well, thanks!)

>
>
> Instead of your approach (which forks off tee to dump a copy of the
packfile), would it not be simpler to add an option --debug-pack (probably
not the best name) that skips the cleanup step when a broken packfile is
detected and prints the name of the downloaded packfile?
>
>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index a912935..fe6ba58 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args,
>>         const char *argv[22];
>>         char keep_arg[256];
>>         char hdr_arg[256];
>> -       const char **av, *cmd_name;
>> +       const char **av, *cmd_name, *savepath;
>>         int do_keep = args->keep_pack;
>>         struct child_process cmd = CHILD_PROCESS_INIT;
>>         int ret;
>> @@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args,
>>         cmd.argv = argv;
>>         av = argv;
>>         *hdr_arg = 0;
>> +       struct pack_header header;
>>         if (!args->keep_pack && unpack_limit) {
>> -               struct pack_header header;
>> -
>>                 if (read_pack_header(demux.out, &header))
>>                         die("protocol error: bad pack header");
>>                 snprintf(hdr_arg, sizeof(hdr_arg),
>> @@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args,
>>                 *av++ = "--strict";
>>         *av++ = NULL;
>>
>> -       cmd.in = demux.out;
>> +       savepath = getenv("GIT_SAVE_FETCHED_PACK_TO");
>> +       if (savepath) {
>> +               struct child_process cmd2 = CHILD_PROCESS_INIT;
>> +               const char *argv2[22];
>> +               int pipefds[2];
>> +               int e;
>> +               const char **av2;
>> +               cmd2.argv = argv2;
>> +               av2 = argv2;
>> +               *av2++ = "tee";
>> +               if (*hdr_arg) {
>> +                       /* hdr_arg being nonempty means we already read
the
>> +                        * pack header from demux, so we need to drop a
pack
>> +                        * header in place for tee to append to,
otherwise
>> +                        * we'll end up with a broken pack on disk.
>> +                        */
>
>
>                         /*
>                          * Write multi-line comments
>                          * like this (/* on its own line)
>                          */
>
>> +                       int fp;
>> +                       struct sha1file *s;
>> +                       fp = open(savepath, O_CREAT | O_TRUNC |
O_WRONLY, 0666);
>> +                       s = sha1fd_throughput(fp, savepath, NULL);
>> +                       sha1write(s, &header, sizeof(header));
>> +                       sha1flush(s);
>
>
> Are you abusing sha1write() and sha1flush() to write a byte sequence to a
file? Is write_in_full() not sufficient?


I didn't know about write_in_full. I'm very much *not* familiar with git's
codebase - I know the protocols and formats reasonably well, but have
needed only occasionally to look at the code. That works, thanks.

>
>
>
>> +                       close(fp);
>> +                       /* -a is supported by both GNU and BSD tee */
>> +                       *av2++ = "-a";
>> +               }
>> +               *av2++ = savepath;
>> +               *av2++ = NULL;
>> +               cmd2.in = demux.out;
>> +               e = pipe(pipefds);
>> +               if (e != 0)
>> +                       die("couldn't make pipe to save pack");
>
>
> start_command() can create the pipe for you. Just say cmd2.out = -1.
>
>> +               cmd2.out = pipefds[1];
>> +               cmd.in = pipefds[0];
>> +               if (start_command(&cmd2))
>> +                       die("couldn't start tee to save a pack");
>
>
> When you call start_command(), you must also call finish_command().
start_command() prints an error message for you; you don't have to do that
(the start_command() in the context below is a bad example).

I looked around, and there are nonzero exit paths from start_command()
which do not print an error and die, so this seems safer. It's also in line
with the vast majority of uses of start_command in the codebase, so I left
this as-is. If you've got something specific you'd like to see here
instead, do let me know (presumably I still need to check the error code
from start_command()...)

>
>
>
>> +       } else
>> +               cmd.in = demux.out;
>>         cmd.git_cmd = 1;
>>         if (start_command(&cmd))
>>                 die("fetch-pack: unable to fork off %s", cmd_name);

[snip some good comments about test cleanups, all addressed]

>
> -- Hannes
>

I'll wait to mail a v3 until at least I know what's going on with
start_command() and error checking - perhaps until I get consensus on the
use of tee vs something else to save the bytes from the server.

https://github.com/durin42/git/commit/save-pack has the current version of
the patch if you want to see where it stands now.

Thanks for the review!

Augie

      parent reply	other threads:[~2015-06-12 16:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAGZ79kaS4utvDbXOo7emmSUH6M-8LY-oA65Ss3PLDkFModkbSg@mail.gmail.com>
2015-06-11 18:59 ` [PATCH v2] fetch-pack: optionally save packs to disk Augie Fackler
2015-06-12  6:22   ` Johannes Sixt
2015-06-12 15:07     ` Junio C Hamano
2015-06-12 17:02       ` Augie Fackler
2015-06-12 18:00       ` Jeff King
2015-06-12 21:25         ` Jeff King
2015-06-12 21:28           ` [PATCH 1/3] pkt-line: simplify starts_with checks in packet tracing Jeff King
2015-06-12 21:35             ` Stefan Beller
2015-06-12 21:28           ` [PATCH 2/3] pkt-line: tighten sideband PACK check when tracing Jeff King
2015-06-12 21:39             ` Stefan Beller
2015-06-12 21:41               ` Jeff King
2015-06-12 21:43                 ` Stefan Beller
2015-06-12 21:28           ` [PATCH 3/3] pkt-line: support tracing verbatim pack contents Jeff King
2015-06-16 15:38             ` Augie Fackler
2015-06-16 16:39               ` Junio C Hamano
2015-06-16 16:43                 ` Jeff King
2015-06-16 16:52                   ` Augie Fackler
2015-06-16 17:23                     ` Jeff King
2015-06-16 17:10               ` Jeff King
2015-06-16 17:14                 ` Augie Fackler
2015-06-16 17:18                   ` Jeff King
2015-06-16 17:23                     ` Augie Fackler
2015-06-16 19:31                       ` [PATCH/RFC 0/3] add GIT_TRACE_STDIN Jeff King
2015-06-16 19:35                         ` [PATCH 1/3] trace: implement %p placeholder for filenames Jeff King
2015-06-16 19:36                         ` [PATCH 2/3] trace: add pid to each output line Jeff King
2015-06-16 19:37                         ` [PATCH 3/3] trace: add GIT_TRACE_STDIN Jeff King
2015-06-16 19:49                           ` Jeff King
2015-06-16 21:20                             ` Jeff King
2015-06-17 10:04                               ` Duy Nguyen
2015-06-17 19:10                                 ` Jeff King
2015-06-18 10:20                                   ` Duy Nguyen
2015-06-26 18:47                                   ` Junio C Hamano
2015-06-12 16:54     ` Augie Fackler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHcr6HZsevG+RyRb3+hghgeRhMY7kmcRZBwt1D8JWn0RfSuHSQ@mail.gmail.com \
    --to=augie@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.