git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net,
	jonathantanmy@google.com, Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v5 5/6] rev-list: add list-objects filtering support
Date: Wed, 22 Nov 2017 12:08:53 -0800	[thread overview]
Message-ID: <20171122200853.GB11671@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <20171121205852.15731-6-git@jeffhostetler.com>

Hi,

Jeff Hostetler wrote:

> Teach rev-list to use the filtering provided by the
> traverse_commit_list_filtered() interface to omit
> unwanted objects from the result.
>
> Object filtering is only allowed when one of the "--objects*"
> options are used.

micronit: the line widths seem to be uneven in these commit messages,
which is a bit distracting when reading.

> When the "--filter-print-omitted" option is used, the omitted
> objects are printed at the end.  These are marked with a "~".
> This option can be combined with "--quiet" to get a list of
> just the omitted objects.

Neat.  Can you give a quick example?

Using --quiet for this feels a bit odd, since it previously meant
to print nothing to stdout.  I wonder if there's another way ---
e.g.

	--print-omitted=(yes|no|only)

If I wanted to list all objects matching a filter, even objects
that are not reachable from any ref, is there a way to do that?
(Just curious, trying to think about this interface.)

> Add t6112 test.

This part doesn't need to be in the commit message.  More generally,
anything I could more easily learn from the code or diffstat doesn't
need to be in the commit message: the commit message is about the
"why" more than the details of what in the code changed.

> In the future, we will introduce a "partial clone" mechanism
> wherein an object in a repo, obtained from a remote, may
> reference a missing object that can be dynamically fetched from
> that remote once needed.  This "partial clone" mechanism will
> have a way, sometimes slow, of determining if a missing link
> is one of the links expected to be produced by this mechanism.

Does this mean the <filter-spec>s will be part of the wire protocol?
I'll look more carefully at them below with that in mind.

> This patch introduces handling of missing objects to help
> debugging and development of the "partial clone" mechanism,
> and once the mechanism is implemented, for a power user to
> perform operations that are missing-object aware without
> incurring the cost of checking if a missing link is expected.

I had trouble understanding what this paragraph is about.  Can you
give an example?

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  Documentation/git-rev-list.txt      |   4 +-
>  Documentation/rev-list-options.txt  |  36 ++++++
>  builtin/rev-list.c                  | 108 ++++++++++++++++-
>  t/t6112-rev-list-filters-objects.sh | 225 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 370 insertions(+), 3 deletions(-)
>  create mode 100755 t/t6112-rev-list-filters-objects.sh

Looks reasonably concise, good.

[...]
> --- a/Documentation/git-rev-list.txt
> +++ b/Documentation/git-rev-list.txt
> @@ -47,7 +47,9 @@ SYNOPSIS
>  	     [ --fixed-strings | -F ]
>  	     [ --date=<format>]
>  	     [ [ --objects | --objects-edge | --objects-edge-aggressive ]
> -	       [ --unpacked ] ]
> +	       [ --unpacked ]
> +	       [ --filter=<filter-spec> [ --filter-print-omitted ] ] ]

Does this mean --filter is only useful with --objects?  E.g. I can't
use it to filter commits?

> +	     [ --missing=<missing-action> ]

--missing=(error|allow-any|print) would be more informative and about
equally concise.

Since this is mainly for debugging, does it have a different
compatibility guarantee from other options?  Could it be named
accordingly to set expectations?

[...]
> +The form '--filter=blob:none' omits all blobs.

Sounds sensible.

> ++
> +The form '--filter=blob:limit=<n>[kmg]' omits blobs larger than n bytes
> +or units.  The value may be zero.

On second thought, doesn't blob:limit=0 mean blob:none is not needed?
Is it for future consistency with tree:none?

What units do [kmg] use? Are they GB, GiB, or one of the variants in
between?

> ++
> +The form '--filter=sparse:oid=<oid-ish>' uses a sparse-checkout
> +specification contained in the object (or the object that the expression
> +evaluates to) to omit blobs that would not be not required for a
> +sparse checkout on the requested refs.

This one makes me a little nervous because it would mean we're
planning on adding sparse-checkout specifications to the wire
protocol.  Maybe that's okay --- they're already part of the on-disk
format --- but it makes me nervous because the sparse-checkout format
is not so great, as I believe MS has already noticed.

What is an <oid-ish>?  Can it just say <blob>?  How would this one
work when passed over the wire?

> ++
> +The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
> +specification contained in <path>.

Is this <path> relative to the cwd of the caller, or is it within some
commit?

Sorry it took so long to send this feedback / these questions.
Hopefully it's useful nevertheless.

Thanks and hope that helps,
Jonathan

  reply	other threads:[~2017-11-22 20:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-21 20:58 [PATCH v5 0/6] Partial clone part 1: object filtering Jeff Hostetler
2017-11-21 20:58 ` [PATCH v5 1/6] dir: allow exclusions from blob in addition to file Jeff Hostetler
2017-11-21 20:58 ` [PATCH v5 2/6] oidmap: add oidmap iterator methods Jeff Hostetler
2017-11-21 20:58 ` [PATCH v5 3/6] oidset: add iterator methods to oidset Jeff Hostetler
2017-11-21 20:58 ` [PATCH v5 4/6] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-11-22 22:56   ` Stefan Beller
2017-11-27 19:39     ` Jeff Hostetler
2017-11-30 22:03       ` Jeff King
2017-11-21 20:58 ` [PATCH v5 5/6] rev-list: add list-objects filtering support Jeff Hostetler
2017-11-22 20:08   ` Jonathan Nieder [this message]
2017-11-29 14:51     ` Jeff Hostetler
2017-11-21 20:58 ` [PATCH v5 6/6] pack-objects: add list-objects filtering Jeff Hostetler
2017-11-22  1:37 ` [PATCH v5 0/6] Partial clone part 1: object filtering Jonathan Tan
2017-11-22  5:14   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122200853.GB11671@aiede.mtv.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).