git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Li Linchao via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
	dscho <johannes.schindelin@gmx.de>,
	Li Linchao <lilinchao@oschina.cn>
Subject: Re: [PATCH v3] builtin/clone.c: add --reject-shallow option
Date: Tue, 09 Feb 2021 12:32:44 -0800	[thread overview]
Message-ID: <xmqqy2fx0yoj.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <pull.865.v3.git.1612793576910.gitgitgadget@gmail.com> (Li Linchao via GitGitGadget's message of "Mon, 08 Feb 2021 14:12:56 +0000")

"Li Linchao via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: lilinchao <lilinchao@oschina.cn>
>
> In some scenarios, users may want more history than the repository
> offered for cloning, which mostly to be a shallow repository, can
> give them.

Sorry, but I find this hard to understand.  Are you saying that most
of the repositories that users try to clone from are shallow?

> But users don't know it is a shallow repository until
> they download it to local, users should have the option to refuse
> to clone this kind of repository, and may want to exit the process
> immediately without creating any unnecessary files.

This one on the other hand is easy to understand, but we would
probably need something like s/But/But because/.

> Althought there is an option '--depth=x' for users to decide how
> deep history they can fetch, but as the unshallow cloning's depth
> is INFINITY, we can't know exactly the minimun 'x' value that can
> satisfy the minimum integrity, so we can't pass 'x' value to --depth,
> and expect this can obtain a complete history of a repository.

Hmph, that is an interesting point.  This makes me wonder if we can
achieve the same without adding a new option at the UI level (e.g.
by allowing "--depth" to take "infinity" and reject cloning if we
find out that the origin repository is a shallow one).  But we can
worry about it later once after we get the machinery driven by the
UI right.

> In other scenarios, given that we have an API that allow us to import
> external repository, and then perform various operations on the repo.

Sorry, but I do not understand what you want to say with these two
lines ("Given that X and Y" needs to be followed by a concluding
statement, e.g. "Given that we have API to import and operate, we
can do Z"---you are missing that "we can do Z" part).

> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index 02d9c19cec75..af5a97903a05 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -15,7 +15,7 @@ SYNOPSIS
>  	  [--dissociate] [--separate-git-dir <git dir>]
>  	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
>  	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
> -	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
> +	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--reject-shallow]

Isn't it "--[no-]reject-shallow"?  Offering the negation from the
command line is essential if "[clone] rejectshallow" configuration
is allowed to set the default to true.

> +--reject-shallow::
> +	Don't clone a shallow source repository. In some scenarios, clients
> +	want the cloned repository information to be complete. Otherwise,
> +	the cloning process should end immediately without creating any files,
> +	which can save some disk space. This can override `clone.rejectshallow`
> +	from the configuration:
> +
> +	--------------------------------------------------------------------
> +	$ git -c clone.rejectshallow=false clone --reject-shallow source out
> +	--------------------------------------------------------------------
> +
> +	While there is a way to countermand a configured "I always refuse to
> +	clone from a shallow repository" with "but let's allow it only this time":
> +
> +	----------------------------------------------------------------------
> +	$ git -c clone.rejectshallow=true clone --no-reject-shallow source out
> +	----------------------------------------------------------------------


This is way too verbose and gives unnecessary details that readers
already know or do not need to know (e.g. setting configuration from
the command line and immediately override it from the command line
is not something end-users would EVER need to do---only test writers
who develop Git would need it).  Something like

    Fail if the source repository is a shallow repository.  The
    `clone.rejectShallow` configuration variable can be used to
    give the default.   

would be sufficient.  All readers ought to know when a configuration
and command line option exist, the latter can be used to override
the default former gives, and it is *not* a job for the description
of an individual option to teach them to such a detail like the
above does.

> +static int option_no_shallow = -1;  /* unspecified */
> +static int config_shallow = -1;    /* unspecified */

Hmph.  I would have expected the usual "prepare a single variable
and initialize it to the default, read the config to set it, and
then parse the command line to overwrite it" sequence would suffice
so it is puzzling why we want two separate variables here.

Let's read on to find out.

> @@ -90,6 +92,8 @@ static struct option builtin_clone_options[] = {
>  	OPT__VERBOSITY(&option_verbosity),
>  	OPT_BOOL(0, "progress", &option_progress,
>  		 N_("force progress reporting")),
> +	OPT_BOOL(0, "reject-shallow", &option_no_shallow,
> +		 N_("don't clone shallow repository")),
>  	OPT_BOOL('n', "no-checkout", &option_no_checkout,
>  		 N_("don't create a checkout")),
>  	OPT_BOOL(0, "bare", &option_bare, N_("create a bare repository")),
> @@ -858,6 +862,9 @@ static int git_clone_config(const char *k, const char *v, void *cb)
>  		free(remote_name);
>  		remote_name = xstrdup(v);
>  	}
> +	if (!strcmp(k, "clone.rejectshallow")) {
> +		config_shallow = git_config_bool(k, v);
> +	}
>  	return git_default_config(k, v, cb);
>  }

You are adding to git_clone_config(), so instead of setting the
value to config_shallow, setting the value to the same variable that
will be used in builtin_clone_options[] array should be sufficient.

cmd_clone() begins like so:


	git_config(git_clone_config, NULL);
	argc = parse_options(...);

which means that single variable (let's call it reject_shallow)
can (1) stay to be its initial value if no config or option is
given, (2) if there is config, the git_config() call will cause
that variable assigned, (3) if there is option, parse_options()
call will cause that variable assigned, possibly overwriting the
value taken from the config.

Which is exactly what we want.  So in short, declare just a single

    static int reject_shallow; /* default to false */
    
instead of "option_no_shallow" and "config_shallow", and use it in
both builtin_clone_options[] given to parse_options, and
git_clone_config() that is given to git_config(), and you'd be fine,
I think.

> @@ -963,6 +970,7 @@ static int path_exists(const char *path)
>  int cmd_clone(int argc, const char **argv, const char *prefix)
>  {
>  	int is_bundle = 0, is_local;
> +	int is_shallow = 0;
>  	const char *repo_name, *repo, *work_tree, *git_dir;
>  	char *path, *dir, *display_repo = NULL;
>  	int dest_exists, real_dest_exists = 0;
> @@ -1205,6 +1213,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  
>  	path = get_repo_path(remote->url[0], &is_bundle);
>  	is_local = option_local != 0 && path && !is_bundle;
> +
> +	/* Detect if the remote repository is shallow */
> +	if (!access(mkpath("%s/shallow", path), F_OK)) {
> +		is_shallow = 1;
> +	}

This is only for cloning from a local repository, no?  IOW, path at
this point may even be "git://github.com/git/git/" and checking with
access() does not make sense.

Ah, it is even worse.  get_repo_path() can return NULL, so mkpath()
will crash in such a case.  This must be at least

	if (path && !access(mkpath("%s/shallow", path), F_OK))
		is_shallow = 1;

but I think the logic fits better in the body of "if (is_Local)"
thing that immediately follows.  It is specific to the case where
cloning from a local repository and access(mkpath()) that is about
the local filesystem (as opposed to going through the transport
layer) belongs there.

>  	if (is_local) {
>  		if (option_depth)
>  			warning(_("--depth is ignored in local clones; use file:// instead."));
> @@ -1214,7 +1228,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
>  		if (filter_options.choice)
>  			warning(_("--filter is ignored in local clones; use file:// instead."));
> -		if (!access(mkpath("%s/shallow", path), F_OK)) {
> +		if (is_shallow) {
>  			if (option_local > 0)
>  				warning(_("source repository is shallow, ignoring --local"));
>  			is_local = 0;

So, I think the above two hunks are making the code worse.  If we
are to detect and reject cloning from the shallow repository when
going through the transport layer (i.e. "--no-local" or cloning from
"git://github.com/git/git", or "https://github.com/git/git", if it
were a shallow repository), that must be handled separately.

> @@ -1222,6 +1236,25 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	}
>  	if (option_local > 0 && !is_local)
>  		warning(_("--local is ignored"));
> +
> +	if (is_shallow) {
> +		int reject = 0;
> +
> +		/* If option_no_shallow is specified from CLI option,
> +		 * ignore config_shallow from git_clone_config.
> +		 */
> +
> +		if (config_shallow != -1) {
> +			reject = config_shallow;
> +		}
> +		if (option_no_shallow != -1) {
> +			reject = option_no_shallow;
> +		}

I do not think any of the above is necessary with just a single
reject_shallow variable that is initialized to 0, can be set by
git_config() callback, and can further be set by parse_options().

> +		if (reject) {
> +			die(_("source repository is shallow, reject to clone."));
> +		}

> +	}
> +
>  	transport->cloning = 1;
>  
>  	transport_set_option(transport, TRANS_OPT_KEEP, "yes");

I do not see how this change would allow users to reject cloning
http://github.com/git/git, if that repository were shallow, though.
I think that would need changes to the code that interacts with
these transport_* functions we see later part of this functrion.

Thanks.

  reply	other threads:[~2021-02-09 20:38 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-04  3:31 [PATCH] builtin/clone.c: add --no-shallow option Li Linchao via GitGitGadget
2021-02-04  5:50 ` Junio C Hamano
2021-02-04 10:32   ` lilinchao
2021-02-04 18:36     ` Junio C Hamano
2021-02-04 14:00 ` Johannes Schindelin
2021-02-04 18:24   ` Junio C Hamano
2021-02-08  8:31 ` [PATCH v2 0/2] " Li Linchao via GitGitGadget
2021-02-08  8:31   ` [PATCH v2 1/2] " lilinchao via GitGitGadget
2021-02-08  8:31   ` [PATCH v2 2/2] builtin/clone.c: add --reject-shallow option lilinchao via GitGitGadget
2021-02-08 13:33   ` [PATCH v2 0/2] builtin/clone.c: add --no-shallow option Derrick Stolee
     [not found]   ` <32bb0d006a1211ebb94254a05087d89a835@gmail.com>
2021-02-08 13:48     ` lilinchao
2021-02-08 14:12   ` [PATCH v3] builtin/clone.c: add --reject-shallow option Li Linchao via GitGitGadget
2021-02-09 20:32     ` Junio C Hamano [this message]
     [not found]     ` <026bd8966b1611eb975aa4badb2c2b1190694@pobox.com>
2021-02-10  9:07       ` lilinchao
2021-02-10 16:27         ` Junio C Hamano
     [not found]         ` <eaa219a86bbc11ebb6c7a4badb2c2b1165032@pobox.com>
2021-02-20 10:40           ` lilinchao
2021-02-21  7:05     ` [PATCH v4] " Li Linchao via GitGitGadget
2021-02-22 18:12       ` Junio C Hamano
2021-03-01 22:03         ` Jonathan Tan
2021-03-01 22:34           ` Junio C Hamano
2021-03-02  8:44           ` lilinchao
2021-03-03 23:59           ` Junio C Hamano
2021-03-04  1:53             ` Jonathan Tan
     [not found]       ` <8f3c00de753911eb93d3d4ae5278bc1270191@pobox.com>
2021-02-28 17:58         ` lilinchao
2021-02-28 18:06       ` [PATCH v5] " Li Linchao via GitGitGadget
2021-03-01  7:11         ` lilinchao
2021-03-01 22:40           ` Johannes Schindelin
2021-03-04  6:26             ` lilinchao
2021-03-03 23:21         ` Junio C Hamano
2021-03-04  5:50           ` lilinchao
2021-03-04 17:19         ` [PATCH v6] " Li Linchao via GitGitGadget
2021-03-12 18:25           ` lilinchao
2021-03-25 11:09           ` [PATCH v7] " Li Linchao via GitGitGadget
2021-03-25 20:31             ` Junio C Hamano
2021-03-25 22:57             ` Junio C Hamano
     [not found]             ` <19c9dc128da911ebacc7d4ae5278bc1233465@pobox.com>
2021-03-26  3:34               ` lilinchao
     [not found]             ` <7a71c96c8dbd11eb8bb0d4ae5278bc1296681@pobox.com>
2021-03-26  3:49               ` lilinchao
2021-03-29 10:19             ` [PATCH v8] " Li Linchao via GitGitGadget
2021-03-29 21:36               ` Junio C Hamano
2021-03-30  9:54               ` Johannes Schindelin
2021-03-30 17:46                 ` Junio C Hamano
2021-03-31 13:30                   ` Johannes Schindelin
     [not found]               ` <f8b2582c913d11ebaddbd4ae5278bc1214940@gmx.de>
2021-03-31 11:03                 ` lilinchao
2021-03-31 15:51               ` [PATCH v9] " lilinchao via GitGitGadget
2021-03-31 19:14                 ` Junio C Hamano
2021-03-31 22:24                   ` Johannes Schindelin
2021-03-31 22:37                     ` Junio C Hamano
2021-04-01 10:46                 ` [PATCH v10] " Li Linchao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy2fx0yoj.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=lilinchao@oschina.cn \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).