All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: matvore@google.com
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v4 04/10] list-objects-filter: implement composite filters
Date: Fri, 21 Jun 2019 17:26:26 -0700	[thread overview]
Message-ID: <20190622002626.245441-1-jonathantanmy@google.com> (raw)
In-Reply-To: <47a2680875e6f68fbf1f2e5a5a2630d263cdf426.1560558910.git.matvore@google.com>

> Allow combining filters such that only objects accepted by all filters
> are shown. The motivation for this is to allow getting directory
> listings without also fetching blobs. This can be done by combining
> blob:none with tree:<depth>. There are massive repositories that have
> larger-than-expected trees - even if you include only a single commit.

First of all, patches 2 and 3 are straightforward and LGTM. On to patch
4...

[snip]

> The current usage requires passing the filter to rev-list in the
> following form:
> 
> 	--filter=<FILTER1> --filter=<FILTER2> ...
> 
> Such usage is currently an error, so giving it a meaning is backwards-
> compatible.
> 
> The URL-encoding scheme is being introduced before the repeated flag
> logic, and the user-facing documentation for URL-encoding is being
> withheld until the repeated flag feature is implemented. The
> URL-encoding is in general not meant to be used directly by the user,
> and it is better to describe the URL-encoding feature in terms of the
> repeated flag.

As of this commit, we don't support such arguments passed to rev-list in
this way, so I would write these paragraphs as:

  A combined filter supports any number of subfilters, and is written in
  the following form:

    combine:<filter 1>+<filter 2>+<filter 3>

  Certain non-alphanumeric characters in each filter must be
  URL-encoded.

  For now, combined filters must be specified in this form. In a
  subsequent commit, rev-list will support multiple --filter arguments
  which will have the same effect as specifying one filter argument
  starting with "combine:".

> Helped-by: Emily Shaffer <emilyshaffer@google.com>
> Helped-by: Jeff Hostetler <git@jeffhostetler.com>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>  list-objects-filter-options.c       | 106 ++++++++++++++++++-
>  list-objects-filter-options.h       |  17 ++-
>  list-objects-filter.c               | 159 ++++++++++++++++++++++++++++
>  t/t6112-rev-list-filters-objects.sh | 151 +++++++++++++++++++++++++-
>  url.c                               |   6 ++
>  url.h                               |   8 ++
>  6 files changed, 441 insertions(+), 6 deletions(-)
> 
> @@ -28,22 +34,20 @@ static int gently_parse_list_objects_filter(
>  	struct strbuf *errbuf)
>  {
>  	const char *v0;
>  
>  	if (filter_options->choice) {
>  		strbuf_addstr(
>  			errbuf, _("multiple filter-specs cannot be combined"));
>  		return 1;
>  	}
>  
> -	filter_options->filter_spec = strdup(arg);
> -

This line has been removed from gently_parse_list_objects_filter()
because this function gains another caller that does not need it.
To compensate, this line has been added to both its existing callers.

> @@ -31,27 +32,37 @@ struct list_objects_filter_options {
>  	 * the filtering algorithm to use.
>  	 */
>  	enum list_objects_filter_choice choice;
>  
>  	/*
>  	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
>  	 */
>  	unsigned int no_filter : 1;
>  
>  	/*
> -	 * Parsed values (fields) from within the filter-spec.  These are
> -	 * choice-specific; not all values will be defined for any given
> -	 * choice.
> +	 * BEGIN choice-specific parsed values from within the filter-spec. Only
> +	 * some values will be defined for any given choice.
>  	 */
> +
>  	struct object_id *sparse_oid_value;
>  	unsigned long blob_limit_value;
>  	unsigned long tree_exclude_depth;
> +
> +	/* LOFC_COMBINE values */
> +
> +	/* This array contains all the subfilters which this filter combines. */
> +	size_t sub_nr, sub_alloc;
> +	struct list_objects_filter_options *sub;
> +
> +	/*
> +	 * END choice-specific parsed values.
> +	 */
>  };

I still think it's cleaner to just have a "left subfilter" and "right
subfilter", but I don't feel strongly about it. In any case, this is an
internal detail and can always be changed in the future.

> +	/*
> +	 * Optional. If this function is supplied and the filter needs to
> +	 * collect omits, then this function is called once before free_fn is
> +	 * called.
> +	 */
> +	void (*finalize_omits_fn)(struct oidset *omits, void *filter_data);

This is needed because a combined filter's omits actually lie in the
subfilters. Resolving it this way means that callers must call
list_objects_filter__free() before using the omits set. Can you add
documentation to __init() (which is the first function to take in the
omits set) and __free() describing this?

(As stated in the test below, we cannot just share one omits set amongst
all the subfilters - see filter_trees_update_omits and the call site
that relies on its return value.)

Here comes the tricky part...

> +static int should_delegate(enum list_objects_filter_situation filter_situation,
> +			   struct object *obj,
> +			   struct subfilter *sub)
> +{
> +	if (!sub->is_skipping_tree)
> +		return 1;
> +	if (filter_situation == LOFS_END_TREE &&
> +		oideq(&obj->oid, &sub->skip_tree)) {
> +		sub->is_skipping_tree = 0;
> +		return 1;
> +	}
> +	return 0;
> +}

Optional: I think this should be called "test_and_set_skip_tree" or
something like that, made to return the inverse of its current return
value, and documented:

  Returns the value of sub->is_skipping_tree at the moment of
  invocation. If iteration is at the LOFS_END_TREE of the tree currently
  being skipped, first clears sub->is_skipping_tree before returning.

> +static enum list_objects_filter_result process_subfilter(
> +	struct repository *r,
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	struct subfilter *sub)
> +{
> +	enum list_objects_filter_result result;
> +
> +	/*
> +	 * Check should_delegate before oidset_contains so that
> +	 * is_skipping_tree gets unset even when the object is marked as seen.
> +	 * As of this writing, no filter uses LOFR_MARK_SEEN on trees that also
> +	 * uses LOFR_SKIP_TREE, so the ordering is only theoretically
> +	 * important. Be cautious if you change the order of the below checks
> +	 * and more filters have been added!
> +	 */
> +	if (!should_delegate(filter_situation, obj, sub))
> +		return LOFR_ZERO;
> +	if (oidset_contains(&sub->seen, &obj->oid))
> +		return LOFR_ZERO;
> +
> +	result = list_objects_filter__filter_object(
> +		r, filter_situation, obj, pathname, filename, sub->filter);
> +
> +	if (result & LOFR_MARK_SEEN)
> +		oidset_insert(&sub->seen, &obj->oid);
> +
> +	if (result & LOFR_SKIP_TREE) {
> +		sub->is_skipping_tree = 1;
> +		sub->skip_tree = obj->oid;
> +	}
> +
> +	return result;
> +}

Looks good.

> +static enum list_objects_filter_result filter_combine(
> +	struct repository *r,
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	struct oidset *omits,
> +	void *filter_data)
> +{
> +	struct combine_filter_data *d = filter_data;
> +	enum list_objects_filter_result combined_result =
> +		LOFR_DO_SHOW | LOFR_MARK_SEEN | LOFR_SKIP_TREE;
> +	size_t sub;
> +
> +	for (sub = 0; sub < d->nr; sub++) {
> +		enum list_objects_filter_result sub_result = process_subfilter(
> +			r, filter_situation, obj, pathname, filename,
> +			&d->sub[sub]);
> +		if (!(sub_result & LOFR_DO_SHOW))
> +			combined_result &= ~LOFR_DO_SHOW;
> +		if (!(sub_result & LOFR_MARK_SEEN))
> +			combined_result &= ~LOFR_MARK_SEEN;
> +		if (!d->sub[sub].is_skipping_tree)
> +			combined_result &= ~LOFR_SKIP_TREE;
> +	}
> +
> +	return combined_result;
> +}

And also looks good. Might be confusing for tree skipping to be
communicated through is_skipping_tree instead of the return value, but
is_skipping_tree needs to be set anyway for other reasons, so that's
convenient.

  parent reply	other threads:[~2019-06-22  0:26 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-01  0:35 [PATCH v2 0/9] Filter combination Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 1/9] list-objects-filter: make API easier to use Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 2/9] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 3/9] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 4/9] list-objects-filter: implement composite filters Matthew DeVore
2019-06-03 21:51   ` Jeff Hostetler
2019-06-06 22:32     ` Matthew DeVore
2019-06-07 17:58       ` Jeff Hostetler
2019-06-01  0:35 ` [PATCH v2 5/9] list-objects-filter-options: move error check up Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 6/9] list-objects-filter-options: make filter_spec a strbuf Matthew DeVore
2019-06-10 20:13   ` Junio C Hamano
2019-06-11  0:34     ` Matthew DeVore
2019-06-11 17:33       ` Junio C Hamano
2019-06-11 18:44         ` Matthew DeVore
2019-06-11 21:34           ` Matthew DeVore
2019-06-11 21:48           ` Junio C Hamano
2019-06-12  0:37             ` Matthew DeVore
2019-06-12 14:55               ` Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 7/9] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 8/9] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-03 22:07   ` Jacob Keller
2019-06-03 22:39     ` Matthew DeVore
2019-06-04  3:16       ` Jacob Keller
2019-06-01  0:36 ` [PATCH v2 9/9] list-objects-filter-options: make parser void Matthew DeVore
2019-06-03 21:35 ` [PATCH v2 0/9] Filter combination Jeff Hostetler
2019-06-13 21:51 ` [PATCH v3 00/10] " Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 01/10] list-objects-filter: make API easier to use Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-14 19:50   ` [PATCH v3 00/10] Filter combination Junio C Hamano
2019-06-15  0:40 ` [PATCH v4 " Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 01/10] list-objects-filter: make API easier to use Matthew DeVore
2019-06-21 22:58     ` Jonathan Tan
2019-06-27  0:46       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-18  8:42     ` Johannes Schindelin
2019-06-18 20:22       ` Matthew DeVore
2019-06-21 18:17         ` Johannes Schindelin
2019-06-22  0:26     ` Jonathan Tan [this message]
2019-06-27 21:12       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-22  0:37     ` Jonathan Tan
2019-06-27 21:17       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-22  0:46     ` Jonathan Tan
2019-06-27 21:24       ` Matthew DeVore
2019-06-27 22:27         ` Matthew DeVore
2019-06-18  1:25   ` [PATCH v4 00/10] Filter combination Junio C Hamano
2019-06-27 22:54 ` [PATCH v5 " Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 01/10] list-objects-filter: encapsulate filter components Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-28 16:05   ` [PATCH v5 00/10] Filter combination Junio C Hamano
2019-06-28 17:16   ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190622002626.245441-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=matvore@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.