All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/8] http-fetch: support fetching packfiles by URL
Date: Fri, 29 May 2020 16:25:52 -0700	[thread overview]
Message-ID: <xmqqeer2xr0f.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <6b3a628719e0593893e537de0220a5e0d5460232.1590789428.git.jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 29 May 2020 15:30:15 -0700")

Jonathan Tan <jonathantanmy@google.com> writes:

> +'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile | <commit>] <url>
> ...
> +--packfile::
> +	Instead of a commit id on the command line (which is not expected in
> +	this case), 'git http-fetch' fetches the packfile directly at the given
> +	URL and uses index-pack to generate corresponding .idx and .keep files.
> +	The output of index-pack is printed to stdout.

This makes sense as an external interface, I guess.

How should this interact with --stdin option, which is more like
"instead of getting a single <dest filename, object name> pair from
the command line, handle many pairs read from the standard input"
batch mode operation.  Would it be beneficial to allow unbounded
number of packfiles, not just a single one, to be fetched and
indexed by a single invocation of the command?  I suspect that given
the relatively large size of a single request for fetching a
packfile, one invocation of the command per packfile won't be too
heavy an overhead, so lack of such an orthogonality may only hurt
conceptual cleanliness, but not performance.  OK.

> -	if (argc != arg + 2 - commits_on_stdin)
> +	if (argc != arg + 2 - (commits_on_stdin || packfile))
>  		usage(http_fetch_usage);
>  	if (commits_on_stdin) {
>  		commits = walker_targets_stdin(&commit_id, &write_ref);
> +	} else if (packfile) {
> +		/* URL will be set later */

Prefer to see an empty statement spelled more explicitly, like this:

		; /* URL will be set later */

Otherwise reader would be left wondering if a line was (or lines
were) accidentally lost after this comment.

>  	} else {
>  		commit_id = (char **) &argv[arg++];
>  		commits = 1;
>  	}
>  
> +	if (packfile) {
> +		url = xstrdup(argv[arg]);
> +	} else {
> +		if (argv[arg])
> +			str_end_url_with_slash(argv[arg], &url);
> +	}
>  
>  	setup_git_directory();
>  
>  	git_config(git_default_config, NULL);
>  
>  	http_init(NULL, url, 0);
> +	if (packfile) {
> +		struct http_pack_request *preq;
> +		struct slot_results results;
> +		int ret;
> +
> +		preq = new_http_pack_request(NULL, url);
> +		if (preq == NULL)
> +			die("couldn't create http pack request");
> +		preq->slot->results = &results;
> +		preq->generate_keep = 1;
> +
> +		if (start_active_slot(preq->slot)) {
> +			run_active_slot(preq->slot);
> +			if (results.curl_result != CURLE_OK) {
> +				die("Unable to get pack file %s\n%s", preq->url,
> +				    curl_errorstr);
> +			}
> +		} else {
> +			die("Unable to start request");
> +		}
> +
> +		if ((ret = finish_http_pack_request(preq)))
> +			die("finish_http_pack_request gave result %d", ret);
> +		release_http_pack_request(preq);
> +		rc = 0;

The above probably want to be a single helper function.

The other side of if/else may also become another helper function.

That way, the flow of control would become clearer.  After all,
these two branches do not share all that much.  Only http-init and
http-cleanup and nothing else.

For that matter, even before introducing this new mode of operation,
another patch to make a preparatory move of the original logic in
this function to a helper function that would be called from the
"else" side may make it easier to see what is going on.

> diff --git a/http.c b/http.c
> index 130e9d6259..ac66215ee6 100644
> --- a/http.c
> +++ b/http.c
> @@ -2280,15 +2280,18 @@ int finish_http_pack_request(struct http_pack_request *preq)
>  	int tmpfile_fd;
>  	int ret = 0;
>  
> -	close_pack_index(p);
> +	if (p)
> +		close_pack_index(p);
>  
>  	fclose(preq->packfile);
>  	preq->packfile = NULL;
>  
> -	lst = preq->lst;
> -	while (*lst != p)
> -		lst = &((*lst)->next);
> -	*lst = (*lst)->next;
> +	if (p) {
> +		lst = preq->lst;
> +		while (*lst != p)
> +			lst = &((*lst)->next);
> +		*lst = (*lst)->next;
> +	}

This is quite ugly.  What is the original meaning of the target
field of the pack_request structure again?  A packed_git structure
that will be filled when we are done fetching the packfile from the
other side and installed in our repository?  When we are (ab)using
http_fetch code to fetch a single packfile, we do not install the
packfile into the running process, because we are only (re)using the
existing machinery as a poor-man's "curl | git index-pack --stdin"?

I do not think it is a bad idea to roll "curl | git index-pack
--stdin" our own, but I do find this an ugly way to do so.  Perhaps
a set of lower-level helper functions can be isolated out of the
existing code before this new feature is added, and then a new "just
fetch and pipe it to the index-pack" feature should be written using
these helpers but with a separate set of entry points?  Would it be
a good way to make the resulting code cleaner than this patch does?
I dunno.

> diff --git a/http.h b/http.h
> index a5b082f3ae..709dfa4c19 100644
> --- a/http.h
> +++ b/http.h
> @@ -223,12 +223,21 @@ struct http_pack_request {
>  	struct active_request_slot *slot;
>  
>  	/*
> -	 * After calling new_http_pack_request(), point lst to the head of the
> +	 * After calling new_http_pack_request(), if fetching a pack that
> +	 * http_get_info_packs() told us about, point lst to the head of the
>  	 * pack list that target is in. finish_http_pack_request() will remove
>  	 * target from lst and call install_packed_git() on target.
>  	 */
>  	struct packed_git **lst;
>  
> +	/*
> +	 * If this is true, finish_http_pack_request() will pass "--keep" to
> +	 * index-pack, resulting in the creation of a keep file, and will not
> +	 * suppress its stdout (that is, the "keep\t<hash>\n" line will be
> +	 * printed to stdout).
> +	 */
> +	unsigned generate_keep : 1;
> +

I suspect that this is a sign that this single patch is trying to
do too many things at the same time.

 - Whether we are fetching a single packfile from a URL, or walking
   to fetch all the packfiles in the repository at a given URL

 - Whether packfiles taken from outer space are marked with the
   "keep" bit

 - Whether the obtained packfile(s) are internally "installed"
   to the running process

are conceptually independent choices, but somehow mixed up, it
seems.

Thanks.


  reply	other threads:[~2020-05-29 23:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29 22:30 [PATCH 0/8] CDN offloading update Jonathan Tan
2020-05-29 22:30 ` [PATCH 1/8] http: use --stdin when getting dumb HTTP pack Jonathan Tan
2020-05-29 23:00   ` Junio C Hamano
2020-06-01 20:37     ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 2/8] http: improve documentation of http_pack_request Jonathan Tan
2020-05-29 22:30 ` [PATCH 3/8] http-fetch: support fetching packfiles by URL Jonathan Tan
2020-05-29 23:25   ` Junio C Hamano [this message]
2020-06-01 20:54     ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 4/8] Documentation: order protocol v2 sections Jonathan Tan
2020-05-29 23:32   ` Junio C Hamano
2020-06-01 20:57     ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 5/8] Documentation: add Packfile URIs design doc Jonathan Tan
2020-05-30  0:15   ` Junio C Hamano
2020-05-30  0:22     ` Junio C Hamano
2020-06-01 23:10       ` Jonathan Tan
2020-06-01 23:07     ` Jonathan Tan
2020-06-10  1:14     ` Jonathan Tan
2020-06-10 17:16       ` Junio C Hamano
2020-06-10 18:04         ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 6/8] upload-pack: refactor reading of pack-objects out Jonathan Tan
2020-05-29 22:30 ` [PATCH 7/8] fetch-pack: support more than one pack lockfile Jonathan Tan
2020-05-29 22:30 ` [PATCH 8/8] upload-pack: send part of packfile response as uri Jonathan Tan
2020-05-31 16:59   ` Junio C Hamano
2020-05-31 17:35   ` Junio C Hamano
2020-06-01 23:20     ` Jonathan Tan
2020-06-01 20:00   ` Jonathan Nieder
2020-06-10 20:57 ` [PATCH v2 0/9] CDN offloading update Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 1/9] http: use --stdin when indexing dumb HTTP pack Jonathan Tan
2020-06-11  1:10     ` Junio C Hamano
2020-06-10 20:57   ` [PATCH v2 2/9] http: refactor finish_http_pack_request() Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 3/9] http-fetch: refactor into function Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 4/9] http-fetch: support fetching packfiles by URL Jonathan Tan
2020-06-11  1:30     ` Junio C Hamano
2020-06-10 20:57   ` [PATCH v2 5/9] Documentation: order protocol v2 sections Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 6/9] Documentation: add Packfile URIs design doc Jonathan Tan
2020-06-11  1:55     ` Junio C Hamano
2020-11-25  9:15     ` Ævar Arnfjörð Bjarmason
2020-11-25 19:09       ` Jonathan Tan
2020-12-01 12:48         ` Ævar Arnfjörð Bjarmason
2020-06-10 20:57   ` [PATCH v2 7/9] upload-pack: refactor reading of pack-objects out Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 8/9] fetch-pack: support more than one pack lockfile Jonathan Tan
2020-06-11  1:41     ` Junio C Hamano
2020-06-10 20:57   ` [PATCH v2 9/9] upload-pack: send part of packfile response as uri Jonathan Tan
2020-06-10 23:16   ` [PATCH v2 0/9] CDN offloading update Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqeer2xr0f.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.