All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Steadmon <steadmon@google.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] clone, submodule: pass partial clone filters to submodules
Date: Tue, 1 Feb 2022 13:34:02 -0800	[thread overview]
Message-ID: <YfmnSkrIMgDd2nbL@google.com> (raw)
In-Reply-To: <CABPp-BHf7audy2X=q14BM5PgLhkuqDCb+vcASVqRDRfdAUewpQ@mail.gmail.com>

On 2022.01.25 13:08, Elijah Newren wrote:
> On Fri, Jan 21, 2022 at 4:31 PM Josh Steadmon <steadmon@google.com> wrote:
> >
> > When cloning a repo with a --filter and with --recurse-submodules
> > enabled, the partial clone filter only applies to
> > the top-level repo. This can lead to unexpected bandwidth and disk
> > usage for projects which include large submodules. For example, a user
> > might wish to make a partial clone of Gerrit and would run:
> > `git clone --recurse-submodules --filter=blob:5k
> > https://gerrit.googlesource.com/gerrit`. However, only the superproject
> > would be a partial clone; all the submodules would have all blobs
> > downloaded regardless of their size. With this change, the same filter
> > applies to submodules, meaning the expected bandwidth and disk savings
> > apply consistently.
> >
> > Plumb the --filter argument from git-clone through git-submodule and
> > git-submodule--helper, such that submodule clones also have the filter
> > applied.
> >
> > This applies the same filter to the superproject and all submodules.
> > Users who prefer the current behavior (i.e., a filter only on the
> > superproject) would need to clone with `--no-recurse-submodules` and
> > then manually initialize each submodule.
> >
> > Applying filters to submodules should be safe thanks to Jonathan Tan's
> > recent work [1, 2, 3] eliminating the use of alternates as a method of
> > accessing submodule objects, so any submodule object access now triggers
> > a lazy fetch from the submodule's promisor remote if the accessed object
> > is missing. This patch is an updated version of [4], which was created
> > prior to Jonathan Tan's work.
> >
> > [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16)
> > [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate',
> >         2021-09-20)
> > [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules',
> >         2021-10-25)
> > [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/
> >
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> > ---
> >  builtin/clone.c                    |  4 ++++
> >  builtin/submodule--helper.c        | 30 ++++++++++++++++++++++---
> >  git-submodule.sh                   | 17 +++++++++++++-
> >  t/t5617-clone-submodules-remote.sh | 17 ++++++++++++++
> >  t/t7814-grep-recurse-submodules.sh | 36 ++++++++++++++++++++++++++++++
> >  5 files changed, 100 insertions(+), 4 deletions(-)
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 727e16e0ae..196e947714 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -729,6 +729,10 @@ static int checkout(int submodule_progress)
> >                         strvec_push(&args, "--no-fetch");
> >                 }
> >
> > +               if (filter_options.choice)
> > +                       strvec_pushf(&args, "--filter=%s",
> > +                                    expand_list_objects_filter_spec(&filter_options));
> > +
> >                 if (option_single_branch >= 0)
> >                         strvec_push(&args, option_single_branch ?
> >                                                "--single-branch" :
> > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> > index c5d3fc3817..11552970f2 100644
> > --- a/builtin/submodule--helper.c
> > +++ b/builtin/submodule--helper.c
> > @@ -20,6 +20,7 @@
> >  #include "diff.h"
> >  #include "object-store.h"
> >  #include "advice.h"
> > +#include "list-objects-filter-options.h"
> >
> >  #define OPT_QUIET (1 << 0)
> >  #define OPT_CACHED (1 << 1)
> > @@ -1630,6 +1631,7 @@ struct module_clone_data {
> >         const char *name;
> >         const char *url;
> >         const char *depth;
> > +       struct list_objects_filter_options *filter_options;
> >         struct string_list reference;
> >         unsigned int quiet: 1;
> >         unsigned int progress: 1;
> > @@ -1796,6 +1798,10 @@ static int clone_submodule(struct module_clone_data *clone_data)
> >                         strvec_push(&cp.args, "--dissociate");
> >                 if (sm_gitdir && *sm_gitdir)
> >                         strvec_pushl(&cp.args, "--separate-git-dir", sm_gitdir, NULL);
> > +               if (clone_data->filter_options && clone_data->filter_options->choice)
> > +                       strvec_pushf(&cp.args, "--filter=%s",
> > +                                    expand_list_objects_filter_spec(
> > +                                            clone_data->filter_options));
> >                 if (clone_data->single_branch >= 0)
> >                         strvec_push(&cp.args, clone_data->single_branch ?
> >                                     "--single-branch" :
> > @@ -1852,6 +1858,7 @@ static int module_clone(int argc, const char **argv, const char *prefix)
> >  {
> >         int dissociate = 0, quiet = 0, progress = 0, require_init = 0;
> >         struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT;
> > +       struct list_objects_filter_options filter_options;
> >
> >         struct option module_clone_options[] = {
> >                 OPT_STRING(0, "prefix", &clone_data.prefix,
> > @@ -1881,17 +1888,19 @@ static int module_clone(int argc, const char **argv, const char *prefix)
> >                            N_("disallow cloning into non-empty directory")),
> >                 OPT_BOOL(0, "single-branch", &clone_data.single_branch,
> >                          N_("clone only one branch, HEAD or --branch")),
> > +               OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
> >                 OPT_END()
> >         };
> >
> >         const char *const git_submodule_helper_usage[] = {
> >                 N_("git submodule--helper clone [--prefix=<path>] [--quiet] "
> >                    "[--reference <repository>] [--name <name>] [--depth <depth>] "
> > -                  "[--single-branch] "
> > +                  "[--single-branch] [--filter <filter-spec>]"
> >                    "--url <url> --path <path>"),
> >                 NULL
> >         };
> >
> > +       memset(&filter_options, 0, sizeof(filter_options));
> >         argc = parse_options(argc, argv, prefix, module_clone_options,
> >                              git_submodule_helper_usage, 0);
> >
> > @@ -1899,12 +1908,14 @@ static int module_clone(int argc, const char **argv, const char *prefix)
> >         clone_data.quiet = !!quiet;
> >         clone_data.progress = !!progress;
> >         clone_data.require_init = !!require_init;
> > +       clone_data.filter_options = &filter_options;
> >
> >         if (argc || !clone_data.url || !clone_data.path || !*(clone_data.path))
> >                 usage_with_options(git_submodule_helper_usage,
> >                                    module_clone_options);
> >
> >         clone_submodule(&clone_data);
> > +       list_objects_filter_release(&filter_options);
> >         return 0;
> >  }
> >
> > @@ -1994,6 +2005,7 @@ struct submodule_update_clone {
> >         const char *recursive_prefix;
> >         const char *prefix;
> >         int single_branch;
> > +       struct list_objects_filter_options *filter_options;
> >
> >         /* to be consumed by git-submodule.sh */
> >         struct update_clone_data *update_clone;
> > @@ -2154,6 +2166,9 @@ static int prepare_to_clone_next_submodule(const struct cache_entry *ce,
> >                 strvec_pushl(&child->args, "--prefix", suc->prefix, NULL);
> >         if (suc->recommend_shallow && sub->recommend_shallow == 1)
> >                 strvec_push(&child->args, "--depth=1");
> > +       if (suc->filter_options && suc->filter_options->choice)
> > +               strvec_pushf(&child->args, "--filter=%s",
> > +                            expand_list_objects_filter_spec(suc->filter_options));
> >         if (suc->require_init)
> >                 strvec_push(&child->args, "--require-init");
> >         strvec_pushl(&child->args, "--path", sub->path, NULL);
> > @@ -2498,6 +2513,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
> >         const char *update = NULL;
> >         struct pathspec pathspec;
> >         struct submodule_update_clone suc = SUBMODULE_UPDATE_CLONE_INIT;
> > +       struct list_objects_filter_options filter_options;
> > +       int ret;
> >
> >         struct option module_update_clone_options[] = {
> >                 OPT_STRING(0, "prefix", &prefix,
> > @@ -2528,6 +2545,7 @@ static int update_clone(int argc, const char **argv, const char *prefix)
> >                            N_("disallow cloning into non-empty directory")),
> >                 OPT_BOOL(0, "single-branch", &suc.single_branch,
> >                          N_("clone only one branch, HEAD or --branch")),
> > +               OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
> >                 OPT_END()
> >         };
> >
> > @@ -2540,20 +2558,26 @@ static int update_clone(int argc, const char **argv, const char *prefix)
> >         update_clone_config_from_gitmodules(&suc.max_jobs);
> >         git_config(git_update_clone_config, &suc.max_jobs);
> >
> > +       memset(&filter_options, 0, sizeof(filter_options));
> >         argc = parse_options(argc, argv, prefix, module_update_clone_options,
> >                              git_submodule_helper_usage, 0);
> > +       suc.filter_options = &filter_options;
> >
> >         if (update)
> >                 if (parse_submodule_update_strategy(update, &suc.update) < 0)
> >                         die(_("bad value for update parameter"));
> >
> > -       if (module_list_compute(argc, argv, prefix, &pathspec, &suc.list) < 0)
> > +       if (module_list_compute(argc, argv, prefix, &pathspec, &suc.list) < 0) {
> > +               list_objects_filter_release(&filter_options);
> >                 return 1;
> > +       }
> >
> >         if (pathspec.nr)
> >                 suc.warn_if_uninitialized = 1;
> >
> > -       return update_submodules(&suc);
> > +       ret = update_submodules(&suc);
> > +       list_objects_filter_release(&filter_options);
> > +       return ret;
> >  }
> >
> >  static int run_update_procedure(int argc, const char **argv, const char *prefix)
> > diff --git a/git-submodule.sh b/git-submodule.sh
> > index 652861aa66..926f6873d3 100755
> > --- a/git-submodule.sh
> > +++ b/git-submodule.sh
> > @@ -10,7 +10,7 @@ USAGE="[--quiet] [--cached]
> >     or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
> >     or: $dashless [--quiet] init [--] [<path>...]
> >     or: $dashless [--quiet] deinit [-f|--force] (--all| [--] <path>...)
> > -   or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--[no-]single-branch] [--] [<path>...]
> > +   or: $dashless [--quiet] update [--init [--filter=<filter-spec>]] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--[no-]single-branch] [--] [<path>...]
> >     or: $dashless [--quiet] set-branch (--default|--branch <branch>) [--] <path>
> >     or: $dashless [--quiet] set-url [--] <path> <newurl>
> >     or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
> > @@ -49,6 +49,7 @@ dissociate=
> >  single_branch=
> >  jobs=
> >  recommend_shallow=
> > +filter=
> >
> >  die_if_unmatched ()
> >  {
> > @@ -347,6 +348,14 @@ cmd_update()
> >                 --no-single-branch)
> >                         single_branch="--no-single-branch"
> >                         ;;
> > +               --filter)
> > +                       case "$2" in '') usage ;; esac
> > +                       filter="--filter=$2"
> > +                       shift
> > +                       ;;
> > +               --filter=*)
> > +                       filter=$1
> 
> Shouldn't this be
>      filter="$1"
> ?  I think it'd work currently without the quotes, but seeing the
> discussion of special characters for --filter=combine in
> git-rev-list(1) make me worry that this could become unsafe in the
> future.

Yes, thanks for the catch. Will fix in V2.

  reply	other threads:[~2022-02-01 21:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  3:32 [PATCH] clone, submodule: pass partial clone filters to submodules Josh Steadmon
2022-01-22  1:49 ` Junio C Hamano
2022-01-25 21:00   ` Elijah Newren
2022-01-26  6:03     ` Junio C Hamano
2022-02-01 21:33       ` Josh Steadmon
2022-01-25 21:08 ` Elijah Newren
2022-02-01 21:34   ` Josh Steadmon [this message]
2022-02-05  0:40 ` [PATCH v2] " Josh Steadmon
2022-02-05  0:54   ` Josh Steadmon
2022-02-05  1:00     ` Josh Steadmon
2022-02-05  5:00 ` [PATCH v3] " Josh Steadmon
2022-02-09 22:44   ` Jonathan Tan
2022-02-09 23:37     ` Junio C Hamano
2022-02-19 17:30   ` [PATCH] t5617,t7814: remove unnecessary 'uploadpack.allowanysha1inwant' Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YfmnSkrIMgDd2nbL@google.com \
    --to=steadmon@google.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.