git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/19] WIP object filtering for partial clone
@ 2017-07-13 17:34 Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

This WIP is a follow up to my earlier patch series to teach
pack-objects to omit large blobs from packfiles. [1]

Like the previous version, this version builds upon a suggestion from
Peff [2] to use the traverse_commit_list() machinery to allow custom
object filtering using a filter callback.  This hides the filtering
logic in list-objects.c and list-objects-filters.c and minimizes the
changes to actual commands, such as pack-objects.

This version adds that same filtering capability to rev-list allowing
filtering to be demonstrated without building a packfile.  Filtered
blobs are printed with a leading "~" (along with their sizes).

    $ ./git rev-list --objects HEAD~1..HEAD
    74f806c70507317b8bdbcf3b08459c7c83906bee
    818617707aac81ae4620239182b514f65638e37e 
    d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
    306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c

    $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD
    74f806c70507317b8bdbcf3b08459c7c83906bee
    818617707aac81ae4620239182b514f65638e37e 
    d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
    ~306c16551e548ace12c709a332bfea22adcc395f 40732

    $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD
    ~306c16551e548ace12c709a332bfea22adcc395f 40732

This version contains 3 filters:
1. filter-omit-all-blobs to exclude all blobs (trees and commits only).

2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n>
   (but always including ".git*" special files).

3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the
   corresponding sparse-checkout.

Sparse-checkout filtering is currently limited to filtering unneeded blobs.
A later enhancement should be able to also filter unneeded tree objects.

This version updates clone, fetch, fetch-pack, and upload-pack commands
to pass the additional object-filter parameters.

As a (possibly) temporary measure, some commands have been updated to
relax missing blob errors during consistency checks.  Maintining info
on missing blobs is currently being discussed in [3].

TODO
1. Incorporate with a patch series like [4] to dynamically fetch a
   missing blob from the server in read_object on demand.
2. Resolve missing blob consistency check issue.
3. Store filter options from clone in config or .git/info and default
   to them in subsequent fetches.
4. fsck, gc, and assorted commands.
5. testing.


[1] https://public-inbox.org/git/20170622203615.34135-1-git@jeffhostetler.com/
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
[4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/


Jeff Hostetler (19):
  dir: refactor add_excludes()
  oidset2: create oidset subclass with object length and pathname
  list-objects: filter objects in traverse_commit_list
  list-objects-filters: add omit-all-blobs filter
  list-objects-filters: add omit-large-blobs filter
  list-objects-filters: add use-sparse-checkout filter
  object-filter: common declarations for object filtering
  rev-list: add object filtering support
  rev-list: add filtering help text
  t6112: rev-list object filtering test
  pack-objects: add object filtering support
  pack-objects: add filtering help text
  upload-pack: add filter-objects to protocol documentation
  upload-pack: add object filtering
  fetch-pack: add object filtering support
  connected: add filter_allow_omitted option to API
  clone: add filter arguments
  index-pack: relax consistency checks for omitted objects
  fetch: add object filtering to fetch

 Documentation/git-pack-objects.txt                |  14 +
 Documentation/git-rev-list.txt                    |   7 +-
 Documentation/rev-list-options.txt                |  26 ++
 Documentation/technical/pack-protocol.txt         |  16 +
 Documentation/technical/protocol-capabilities.txt |   7 +
 Makefile                                          |   3 +
 builtin/clone.c                                   |  28 ++
 builtin/fetch-pack.c                              |   3 +
 builtin/fetch.c                                   |  27 +-
 builtin/index-pack.c                              |  15 +
 builtin/pack-objects.c                            |  33 +-
 builtin/rev-list.c                                |  58 +++-
 connected.c                                       |   3 +
 connected.h                                       |   6 +
 dir.c                                             |  53 +++-
 dir.h                                             |   4 +
 fetch-pack.c                                      |  28 ++
 fetch-pack.h                                      |   2 +
 list-objects-filters.c                            | 361 ++++++++++++++++++++++
 list-objects-filters.h                            |  45 +++
 list-objects.c                                    |  66 +++-
 list-objects.h                                    |  30 ++
 object-filter.c                                   | 201 ++++++++++++
 object-filter.h                                   | 145 +++++++++
 oidset2.c                                         | 101 ++++++
 oidset2.h                                         |  56 ++++
 t/t6112-rev-list-filters-objects.sh               |  37 +++
 transport.c                                       |  27 ++
 transport.h                                       |   8 +
 upload-pack.c                                     |  39 ++-
 30 files changed, 1425 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filters.c
 create mode 100644 list-objects-filters.h
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h
 create mode 100644 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/19] dir: refactor add_excludes()
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Refactor add_excludes() to separate the reading of the
exclude file into a buffer and the parsing of the buffer
into exclude_list items.

Add add_excludes_from_blob_to_list() to allow an exclude
file be specified with an OID.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 dir.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 dir.h |  4 ++++
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 31f9343..aeba965 100644
--- a/dir.c
+++ b/dir.c
@@ -725,6 +725,11 @@ static void invalidate_directory(struct untracked_cache *uc,
 		dir->dirs[i]->recurse = 0;
 }
 
+static int add_excludes_from_buffer(
+	char *buf, size_t size,
+	const char *base, int baselen,
+	struct exclude_list *el);
+
 /*
  * Given a file with name "fname", read it (either from disk, or from
  * the index if "check_index" is non-zero), parse it and store the
@@ -739,9 +744,9 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 			struct sha1_stat *sha1_stat)
 {
 	struct stat st;
-	int fd, i, lineno = 1;
+	int fd;
 	size_t size = 0;
-	char *buf, *entry;
+	char *buf;
 
 	fd = open(fname, O_RDONLY);
 	if (fd < 0 || fstat(fd, &st) < 0) {
@@ -798,6 +803,18 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 		}
 	}
 
+	add_excludes_from_buffer(buf, size, base, baselen, el);
+	return 0;
+}
+
+static int add_excludes_from_buffer(
+	char *buf, size_t size,
+	const char *base, int baselen,
+	struct exclude_list *el)
+{
+	int i, lineno = 1;
+	char *entry;
+
 	el->filebuf = buf;
 
 	if (skip_utf8_bom(&buf, size))
@@ -826,6 +843,38 @@ int add_excludes_from_file_to_list(const char *fname, const char *base,
 	return add_excludes(fname, base, baselen, el, check_index, NULL);
 }
 
+int add_excludes_from_blob_to_list(
+	struct object_id *oid,
+	const char *base, int baselen,
+	struct exclude_list *el)
+{
+	char *buf;
+	unsigned long size;
+	enum object_type type;
+
+	buf = read_sha1_file(oid->hash, &type, &size);
+	if (!buf)
+		return -1;
+
+	if (type != OBJ_BLOB) {
+		free(buf);
+		return -1;
+	}
+
+	if (size == 0) {
+		free(buf);
+		return 0;
+	}
+
+	if (buf[size - 1] != '\n') {
+		buf = xrealloc(buf, st_add(size, 1));
+		buf[size++] = '\n';
+	}
+
+	add_excludes_from_buffer(buf, size, base, baselen, el);
+	return 0;
+}
+
 struct exclude_list *add_exclude_list(struct dir_struct *dir,
 				      int group_type, const char *src)
 {
diff --git a/dir.h b/dir.h
index edb5fda..8e754e5 100644
--- a/dir.h
+++ b/dir.h
@@ -242,6 +242,10 @@ extern struct exclude_list *add_exclude_list(struct dir_struct *dir,
 extern int add_excludes_from_file_to_list(const char *fname, const char *base, int baselen,
 					  struct exclude_list *el, int check_index);
 extern void add_excludes_from_file(struct dir_struct *, const char *fname);
+extern int add_excludes_from_blob_to_list(
+	struct object_id *oid,
+	const char *base, int baselen,
+	struct exclude_list *el);
 extern void parse_exclude_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen);
 extern void add_exclude(const char *string, const char *base,
 			int baselen, struct exclude_list *el, int srcpos);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.

This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile  |   1 +
 oidset2.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 oidset2.h |  56 ++++++++++++++++++++++++++++++++++
 3 files changed, 158 insertions(+)
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h

diff --git a/Makefile b/Makefile
index ffa6da7..d590508 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += oidset2.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/oidset2.c b/oidset2.c
new file mode 100644
index 0000000..806d153
--- /dev/null
+++ b/oidset2.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "oidset2.h"
+
+static int oidset2_hashcmp(const void *va, const void *vb,
+			  const void *vkey)
+{
+	const struct oidset2_entry *a = va, *b = vb;
+	const struct object_id *key = vkey;
+	return oidcmp(&a->oid, key ? key : &b->oid);
+}
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid)
+{
+	struct hashmap_entry key;
+	struct oidset2_entry *value;
+
+	if (!set->map.cmpfn)
+		return NULL;
+
+	hashmap_entry_init(&key, sha1hash(oid->hash));
+	value = hashmap_get(&set->map, &key, oid);
+
+	return value;
+}
+
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid)
+{
+	return !!oidset2_get(set, oid);
+}
+
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+		   int64_t object_length, const char *pathname)
+{
+	struct oidset2_entry *entry;
+
+	if (!set->map.cmpfn)
+		hashmap_init(&set->map, oidset2_hashcmp, 0);
+
+	if (oidset2_contains(set, oid))
+		return 1;
+
+	entry = xcalloc(1, sizeof(*entry));
+	hashmap_entry_init(&entry->hash, sha1hash(oid->hash));
+	oidcpy(&entry->oid, oid);
+
+	entry->object_length = object_length;
+	if (pathname)
+	    entry->pathname = strdup(pathname);
+
+	hashmap_add(&set->map, entry);
+	return 0;
+}
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid)
+{
+	struct hashmap_entry key;
+	struct oidset2_entry *e;
+
+	hashmap_entry_init(&key, sha1hash(oid->hash));
+	e = hashmap_remove(&set->map, &key, oid);
+
+	free(e->pathname);
+	free(e);
+}
+
+void oidset2_clear(struct oidset2 *set)
+{
+	hashmap_free(&set->map, 1);
+}
+
+static int oidset2_cmp(const void *a, const void *b)
+{
+	const struct oidset2_entry *ae = *((const struct oidset2_entry **)a);
+	const struct oidset2_entry *be = *((const struct oidset2_entry **)b);
+
+	return oidcmp(&ae->oid, &be->oid);
+}
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data)
+{
+	struct hashmap_iter iter;
+	struct oidset2_entry **array;
+	struct oidset2_entry *e;
+	int j, k;
+
+	array = xcalloc(set->map.size, sizeof(*e));
+
+	hashmap_iter_init(&set->map, &iter);
+	k = 0;
+	while ((e = hashmap_iter_next(&iter)))
+		array[k++] = e;
+
+	QSORT(array, k, oidset2_cmp);
+
+	for (j = 0; j < k; j++) {
+		e = array[j];
+		cb(j, k, e, cb_data);
+	}
+
+	free(array);
+}
diff --git a/oidset2.h b/oidset2.h
new file mode 100644
index 0000000..c498eae
--- /dev/null
+++ b/oidset2.h
@@ -0,0 +1,56 @@
+#ifndef OIDSET2_H
+#define OIDSET2_H
+
+/**
+ * oidset2 is a variant of oidset, but allows additional fields for each object.
+ */
+
+/**
+ * A single oidset2; should be zero-initialized (or use OIDSET2_INIT).
+ */
+struct oidset2 {
+	struct hashmap map;
+};
+
+#define OIDSET2_INIT { { NULL } }
+
+struct oidset2_entry {
+	struct hashmap_entry hash;
+	struct object_id oid;
+
+	int64_t object_length;	/* This is SIGNED. Use -1 when unknown. */
+	char *pathname;
+};
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Returns true iff `set` contains `oid`.
+ */
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Insert the oid into the set; a copy is made, so "oid" does not need
+ * to persist after this function is called.
+ *
+ * Returns 1 if the oid was already in the set, 0 otherwise. This can be used
+ * to perform an efficient check-and-add.
+ */
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+		   int64_t object_length, const char *pathname);
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid);
+
+typedef void (*oidset2_foreach_cb)(
+	int i, int i_limit,
+	struct oidset2_entry *e, void *cb_data);
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data);
+
+/**
+ * Remove all entries from the oidset2, freeing any resources associated with
+ * it.
+ */
+void oidset2_clear(struct oidset2 *set);
+
+#endif /* OIDSET2_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted (not shown)
during a traversal.

Update traverse_commit_list() to be a wrapper for the above.

Filtering will be used in a future commit by rev-list and
pack-objects for narrow/partial clone/fetch to omit certain
blobs from the output.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--------------
 list-objects.h | 30 ++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..8dddeda 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
 			 show_object_fn show,
 			 struct strbuf *path,
 			 const char *name,
-			 void *cb_data)
+			 void *cb_data,
+			 filter_object_fn filter,
+			 void *filter_data)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
+	list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
 	if (!revs->blob_objects)
 		return;
@@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs,
 		die("bad blob object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	obj->flags |= SEEN;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	show(obj, path->buf, cb_data);
+	if (filter)
+		r = filter(LOFT_BLOB, obj, path->buf, &path->buf[pathlen], filter_data);
+	if (r & LOFR_MARK_SEEN)
+		obj->flags |= SEEN;
+	if (r & LOFR_SHOW)
+		show(obj, path->buf, cb_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs,
 			 show_object_fn show,
 			 struct strbuf *base,
 			 const char *name,
-			 void *cb_data)
+			 void *cb_data,
+			 filter_object_fn filter,
+			 void *filter_data)
 {
 	struct object *obj = &tree->object;
 	struct tree_desc desc;
@@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs,
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
 		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
+	list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
 	if (!revs->tree_objects)
 		return;
@@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs,
 		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
-	obj->flags |= SEEN;
 	strbuf_addstr(base, name);
-	show(obj, base->buf, cb_data);
+	if (filter)
+		r = filter(LOFT_BEGIN_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+	if (r & LOFR_MARK_SEEN)
+		obj->flags |= SEEN;
+	if (r & LOFR_SHOW)
+		show(obj, base->buf, cb_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs,
 			process_tree(revs,
 				     lookup_tree(entry.oid->hash),
 				     show, base, entry.path,
-				     cb_data);
+				     cb_data, filter, filter_data);
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(revs, entry.oid->hash,
 					show, base, entry.path,
@@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs,
 			process_blob(revs,
 				     lookup_blob(entry.oid->hash),
 				     show, base, entry.path,
-				     cb_data);
+				     cb_data, filter, filter_data);
 	}
+
+	if (filter) {
+		r = filter(LOFT_END_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+		if (r & LOFR_MARK_SEEN)
+			obj->flags |= SEEN;
+		if (r & LOFR_SHOW)
+			show(obj, base->buf, cb_data);
+	}
+
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
 }
@@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+void traverse_commit_list_filtered(
+	struct rev_info *revs,
+	show_commit_fn show_commit, show_object_fn show_object, void *show_data,
+	filter_object_fn filter, void *filter_data)
 {
 	int i;
 	struct commit *commit;
@@ -200,7 +223,7 @@ void traverse_commit_list(struct rev_info *revs,
 		 */
 		if (commit->tree)
 			add_pending_tree(revs, commit->tree);
-		show_commit(commit, data);
+		show_commit(commit, show_data);
 	}
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
@@ -211,19 +234,19 @@ void traverse_commit_list(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, data);
+			show_object(obj, name, show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     &base, path, data);
+				     &base, path, show_data, filter, filter_data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     &base, path, data);
+				     &base, path, show_data, filter, filter_data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
@@ -232,3 +255,14 @@ void traverse_commit_list(struct rev_info *revs,
 	object_array_clear(&revs->pending);
 	strbuf_release(&base);
 }
+
+void traverse_commit_list(struct rev_info *revs,
+			  show_commit_fn show_commit,
+			  show_object_fn show_object,
+			  void *show_data)
+{
+	traverse_commit_list_filtered(
+		revs,
+		show_commit, show_object, show_data,
+		NULL, NULL);
+}
diff --git a/list-objects.h b/list-objects.h
index 0cebf85..964e7d3 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -8,4 +8,34 @@ void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, voi
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
 
+enum list_objects_filter_result {
+	LOFR_ZERO      = 0,
+	LOFR_MARK_SEEN = 1<<0,
+	LOFR_SHOW      = 1<<1,
+};
+
+/* See object.h and revision.h */
+#define FILTER_REVISIT (1<<25)
+
+enum list_objects_filter_type {
+	LOFT_BEGIN_TREE,
+	LOFT_END_TREE,
+	LOFT_BLOB
+};
+
+typedef enum list_objects_filter_result list_objects_filter_result;
+typedef enum list_objects_filter_type list_objects_filter_type;
+
+typedef list_objects_filter_result (*filter_object_fn)(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data);
+
+void traverse_commit_list_filtered(
+	struct rev_info *,
+	show_commit_fn, show_object_fn, void *show_data,
+	filter_object_fn filter, void *filter_data);
+
 #endif
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (2 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a simple filter for traverse_commit_list_filtered() to
omit all blobs from the result.

This filter will be used in a future commit by rev-list and
pack-objects to create a "commits and trees" result.  This
is intended for a narrow/partial clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile               |  1 +
 list-objects-filters.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h | 17 ++++++++++
 3 files changed, 103 insertions(+)
 create mode 100644 list-objects-filters.c
 create mode 100644 list-objects-filters.h

diff --git a/Makefile b/Makefile
index d590508..48fdcf2 100644
--- a/Makefile
+++ b/Makefile
@@ -773,6 +773,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filters.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
diff --git a/list-objects-filters.c b/list-objects-filters.c
new file mode 100644
index 0000000..f29d8bc
--- /dev/null
+++ b/list-objects-filters.c
@@ -0,0 +1,85 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filters.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs from the traversal.
+ */
+struct filter_omit_all_blobs_data {
+	struct oidset2 omits;
+};
+
+static list_objects_filter_result filter_omit_all_blobs(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_omit_all_blobs_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	unsigned long s;
+	enum object_type t;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		/*
+		 * Since we always omit all blobs (and never provisionally omit),
+		 * we should never see a blob twice.
+		 */
+		assert(!oidset2_contains(&filter_data->omits, &obj->oid));
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		/* Insert OID into the omitted list. No need for a pathname. */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       NULL);
+		return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */
+	}
+}
+
+void traverse_commit_list_omit_all_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data)
+{
+	struct filter_omit_all_blobs_data d;
+
+	memset(&d, 0, sizeof(d));
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_omit_all_blobs, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
new file mode 100644
index 0000000..b981020
--- /dev/null
+++ b/list-objects-filters.h
@@ -0,0 +1,17 @@
+#ifndef LIST_OBJECTS_FILTERS_H
+#define LIST_OBJECTS_FILTERS_H
+
+#include "oidset2.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs
+ * from the traversal.
+ */
+void traverse_commit_list_omit_all_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data);
+
+#endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (3 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a filter for traverse_commit_list_filtered() to omit
blobs larger than a requested size from the result, but always
include ".git*" special files.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects-filters.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h | 12 +++++++
 2 files changed, 109 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f29d8bc..f04d70e 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -83,3 +83,100 @@ void traverse_commit_list_omit_all_blobs(
 
 	oidset2_clear(&d.omits);
 }
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+struct filter_omit_large_blobs_data {
+	struct oidset2 omits;
+	int64_t max_bytes;
+};
+
+static list_objects_filter_result filter_omit_large_blobs(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_omit_large_blobs_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	unsigned long s;
+	enum object_type t;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		/*
+		 * If previously provisionally omitted (because of size), see if the
+		 * current filename is special and force it to be included.
+		 */
+		if (oidset2_contains(&filter_data->omits, &obj->oid)) {
+			if ((strncmp(filename, ".git", 4) == 0) && filename[4]) {
+				oidset2_remove(&filter_data->omits, &obj->oid);
+				return LOFR_MARK_SEEN | LOFR_SHOW;
+			}
+			return LOFR_ZERO; /* continue provisionally omitting it */
+		}
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		if (object_length < filter_data->max_bytes)
+			return LOFR_MARK_SEEN | LOFR_SHOW;
+
+		/*
+		 * Provisionally omit it.  We've already established that this blob
+		 * is too big and doesn't have a special filename, so we WANT to
+		 * omit it.  However, there may be a special file elsewhere in the
+		 * tree that references this same blob, so we cannot reject it yet.
+		 * Leave the LOFR_ bits unset so that if the blob appears again in
+		 * the traversal, we will be asked again.
+		 *
+		 * No need for a pathname, since we only test for special filenames
+		 * above.
+		 */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       NULL);
+		return LOFR_ZERO;
+	}
+}
+
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit)
+{
+	struct filter_omit_large_blobs_data d;
+
+	memset(&d, 0, sizeof(d));
+	d.max_bytes = large_byte_limit;
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_omit_large_blobs, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index b981020..32b2833 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -14,4 +14,16 @@ void traverse_commit_list_omit_all_blobs(
 	oidset2_foreach_cb print_omitted_object,
 	void *ctx_data);
 
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit);
+
 #endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (4 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a filter for traverse_commit_list_filtered() to omit the
blobs that would not be needed by a sparse checkout using the
given sparse-checkout spec.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

A future enhancement should be able to also omit tree objects
not needed by such a sparse checkout, but that is not currently
supported.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects-filters.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h |  16 +++++
 2 files changed, 195 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f04d70e..cacf645 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -180,3 +180,182 @@ void traverse_commit_list_omit_large_blobs(
 
 	oidset2_clear(&d.omits);
 }
+
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+struct frame {
+	int defval;
+	int child_prov_omit : 1;
+};
+
+struct filter_use_sparse_data {
+	struct oidset2 omits;
+	struct exclude_list el;
+
+	size_t nr, alloc;
+	struct frame *array_frame;
+};
+
+static list_objects_filter_result filter_use_sparse(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_use_sparse_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	int val, dtype;
+	unsigned long s;
+	enum object_type t;
+	struct frame *frame;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		dtype = DT_DIR;
+		val = is_excluded_from_list(pathname, strlen(pathname),
+					    filename, &dtype, &filter_data->el);
+		if (val < 0)
+			val = filter_data->array_frame[filter_data->nr].defval;
+
+		ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1,
+			   filter_data->alloc);
+		filter_data->nr++;
+		filter_data->array_frame[filter_data->nr].defval = val;
+		filter_data->array_frame[filter_data->nr].child_prov_omit = 0;
+
+		/*
+		 * A directory with this tree OID may appear in multiple
+		 * places in the tree. (Think of a directory move, with
+		 * no other changes.)  And with a different pathname, the
+		 * is_excluded...() results for this directory and items
+		 * contained within it may be different.  So we cannot
+		 * mark it SEEN (yet), since that will prevent process_tree()
+		 * from revisiting this tree object with other pathnames.
+		 *
+		 * Only SHOW the tree object the first time we visit this
+		 * tree object.
+		 *
+		 * We always show all tree objects.  A future optimization
+		 * may want to attempt to narrow this.
+		 */
+		if (obj->flags & FILTER_REVISIT)
+			return LOFR_ZERO;
+		obj->flags |= FILTER_REVISIT;
+		return LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		assert(filter_data->nr > 0);
+
+		frame = &filter_data->array_frame[filter_data->nr];
+		filter_data->nr--;
+
+		/*
+		 * Tell our parent directory if any of our children were
+		 * provisionally omitted.
+		 */
+		filter_data->array_frame[filter_data->nr].child_prov_omit |=
+			frame->child_prov_omit;
+
+		/*
+		 * If there are NO provisionally omitted child objects (ALL child
+		 * objects in this folder were INCLUDED), then we can mark the
+		 * folder as SEEN (so we will not have to revisit it again).
+		 */
+		if (!frame->child_prov_omit)
+			return LOFR_MARK_SEEN;
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		frame = &filter_data->array_frame[filter_data->nr];
+
+		/*
+		 * If we previously provisionally omitted this blob because
+		 * its pathname was not in the sparse-checkout AND this
+		 * reference to the blob has the same pathname, we can avoid
+		 * repeating the exclusion logic on this pathname and just
+		 * continue to provisionally omit it.
+		 */
+		if (obj->flags & FILTER_REVISIT) {
+			struct oidset2_entry *entry_prev;
+			entry_prev = oidset2_get(&filter_data->omits, &obj->oid);
+			if (entry_prev && !strcmp(pathname, entry_prev->pathname)) {
+				frame->child_prov_omit = 1;
+				return LOFR_ZERO;
+			}
+		}
+
+		dtype = DT_REG;
+		val = is_excluded_from_list(pathname, strlen(pathname),
+					    filename, &dtype, &filter_data->el);
+		if (val < 0)
+			val = frame->defval;
+		if (val > 0)
+			return LOFR_MARK_SEEN | LOFR_SHOW;
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		/*
+		 * Provisionally omit it.  We've already established that
+		 * this pathname is not in the sparse-checkout specification,
+		 * so we WANT to omit this blob.  However, a pathname elsewhere
+		 * in the tree may also reference this same blob, so we cannot
+		 * reject it yet.  Leave the LOFR_ bits unset so that if the
+		 * blob appears again in the traversal, we will be asked again.
+		 *
+		 * The pathname we associate with this omit is just the first
+		 * one we saw for this blob.  Other instances of this blob may
+		 * have other pathnames and that is fine.  We just use it for
+		 * perf because most of the time, the blob will be in the same
+		 * place as we walk the commits.
+		 */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       pathname);
+		obj->flags |= FILTER_REVISIT;
+		frame->child_prov_omit = 1;
+		return LOFR_ZERO;
+	}
+}
+
+void traverse_commit_list_use_sparse(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	struct object_id *oid)
+{
+	struct filter_use_sparse_data d;
+
+	memset(&d, 0, sizeof(d));
+	if (add_excludes_from_blob_to_list(oid, NULL, 0, &d.el) < 0)
+		die("filter_use_sparse could not load specification");
+	ALLOC_GROW(d.array_frame, d.nr + 1, d.alloc);
+	d.array_frame[d.nr].defval = 0; /* default to include */
+	d.array_frame[d.nr].child_prov_omit = 0;
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_use_sparse, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index 32b2833..52e507b 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -26,4 +26,20 @@ void traverse_commit_list_omit_large_blobs(
 	void *ctx_data,
 	int64_t large_byte_limit);
 
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+void traverse_commit_list_use_sparse(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	struct object_id *oid);
+
 #endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/19] object-filter: common declarations for object filtering
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (5 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create common routines and defines for parsing
object-filter-related command line arguments and
pack-protocol fields.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile        |   1 +
 object-filter.c | 201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 object-filter.h | 145 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 347 insertions(+)
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h

diff --git a/Makefile b/Makefile
index 48fdcf2..daa9ea2 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += object-filter.o
 LIB_OBJS += oidset.o
 LIB_OBJS += oidset2.o
 LIB_OBJS += pack-bitmap.o
diff --git a/object-filter.c b/object-filter.c
new file mode 100644
index 0000000..5be6129
--- /dev/null
+++ b/object-filter.c
@@ -0,0 +1,201 @@
+#include "cache.h"
+#include "commit.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "oidset2.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
+
+int parse_filter_omit_all_blobs(struct object_filter_options *filter_options)
+{
+	if (object_filter_enabled(filter_options))
+		die(_("multiple object filter types cannot be combined"));
+
+	filter_options->omit_all_blobs = 1;
+	return 0;
+}
+
+int parse_filter_omit_large_blobs(struct object_filter_options *filter_options,
+				  const char *arg)
+{
+	if (object_filter_enabled(filter_options))
+		die(_("multiple object filter types cannot be combined"));
+
+	filter_options->omit_large_blobs = 1;
+
+	/* we allow "<digits>[kmg]" */
+	if (!git_parse_ulong(arg, &filter_options->large_byte_limit))
+		die(_("invalid size limit for large object filter"));
+
+	filter_options->large_byte_limit_string = strdup(arg);
+	return 0;
+}
+
+int parse_filter_use_sparse(struct object_filter_options *filter_options,
+			    const char *arg)
+{
+	struct object_context oc;
+
+	if (object_filter_enabled(filter_options))
+		die(_("multiple object filter types cannot be combined"));
+
+	filter_options->use_sparse = 1;
+
+	/*
+	 * The command line argument needs to resolve to an known OID
+	 * representing the content of the desired sparse-checkout file.
+	 * We allow various syntax forms for the convenience of the user.
+	 * See sha1_name.c:get_sha1_with_context_1().
+	 *
+	 * Try to evaluate the arg locally in case they use one of the
+	 * convenience patterns.  This must resolve to a blob.
+	 */
+	if (get_sha1_with_context(arg, GET_SHA1_BLOB,
+				  filter_options->sparse_oid.hash, &oc)) {
+		/*
+		 * If that fails, keep the original string in case a client
+		 * command wants to send it to the server.  This allows the
+		 * client to name an OID for a blob they don't have.
+		 */
+		filter_options->sparse_value = strdup(arg);
+		oidcpy(&filter_options->sparse_oid, &null_oid);
+	} else {
+		/*
+		 * Round-trip the found OID to normalize it.
+		 */
+		filter_options->sparse_value =
+			strdup(oid_to_hex(&filter_options->sparse_oid));
+	}
+	
+	return 0;
+}
+
+int parse_filter_print_manifest(struct object_filter_options *filter_options)
+{
+	filter_options->print_manifest = 1;
+	return 0;
+}
+
+int parse_filter_relax(struct object_filter_options *filter_options)
+{
+	filter_options->relax = 1;
+	return 0;
+}
+
+int opt_parse_filter_omit_all_blobs(const struct option *opt,
+				    const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(!arg);
+	assert(!unset);
+
+	return parse_filter_omit_all_blobs(filter_options);
+}
+
+int opt_parse_filter_omit_large_blobs(const struct option *opt,
+				      const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(arg);
+	assert(!unset);
+
+	return parse_filter_omit_large_blobs(filter_options, arg);
+}
+
+int opt_parse_filter_use_sparse(const struct option *opt,
+				const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(arg);
+	assert(!unset);
+
+	return parse_filter_use_sparse(filter_options, arg);
+}
+
+int opt_parse_filter_print_manifest(const struct option *opt,
+				    const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(!arg);
+	assert(!unset);
+
+	return parse_filter_print_manifest(filter_options);
+}
+
+int opt_parse_filter_relax(const struct option *opt,
+			   const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(!arg);
+	assert(!unset);
+
+	return parse_filter_relax(filter_options);
+}
+
+int object_filter_hand_parse_arg(struct object_filter_options *filter_options,
+				 const char *arg,
+				 int allow_print_manifest,
+				 int allow_relax)
+{
+	if (!strcmp(arg, ("--"CL_ARG_FILTER_OMIT_ALL_BLOBS))) {
+		parse_filter_omit_all_blobs(filter_options);
+		return 1;
+	}
+	if (skip_prefix(arg, ("--"CL_ARG_FILTER_OMIT_LARGE_BLOBS"="), &arg)) {
+		parse_filter_omit_large_blobs(filter_options, arg);
+		return 1;
+	}
+	if (skip_prefix(arg, ("--"CL_ARG_FILTER_USE_SPARSE"="), &arg)) {
+		parse_filter_use_sparse(filter_options, arg);
+		return 1;
+	}
+
+	if (allow_print_manifest &&
+	    !strcmp(arg, ("--"CL_ARG_FILTER_PRINT_MANIFEST))) {
+		parse_filter_print_manifest(filter_options);
+		return 1;
+	}
+
+	if (allow_relax && !strcmp(arg, ("--"CL_ARG_FILTER_RELAX))) {
+		parse_filter_relax(filter_options);
+		return 1;
+	}
+
+	return 0;
+}
+
+int object_filter_hand_parse_protocol(struct object_filter_options *filter_options,
+				      const char *arg,
+				      int allow_print_manifest,
+				      int allow_relax)
+{
+	if (!strcmp(arg, CL_ARG_FILTER_OMIT_ALL_BLOBS)) {
+		parse_filter_omit_all_blobs(filter_options);
+		return 1;
+	}
+	if (skip_prefix(arg, (CL_ARG_FILTER_OMIT_LARGE_BLOBS" "), &arg)) {
+		parse_filter_omit_large_blobs(filter_options, arg);
+		return 1;
+	}
+	if (skip_prefix(arg, (CL_ARG_FILTER_USE_SPARSE" "), &arg)) {
+		parse_filter_use_sparse(filter_options, arg);
+		return 1;
+	}
+
+	if (allow_print_manifest &&
+	    !strcmp(arg, CL_ARG_FILTER_PRINT_MANIFEST)) {
+		parse_filter_print_manifest(filter_options);
+		return 1;
+	}
+	if (allow_relax && !strcmp(arg, CL_ARG_FILTER_RELAX)) {
+		parse_filter_relax(filter_options);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/object-filter.h b/object-filter.h
new file mode 100644
index 0000000..f1ca5fb
--- /dev/null
+++ b/object-filter.h
@@ -0,0 +1,145 @@
+#ifndef OBJECT_FILTER_H
+#define OBJECT_FILTER_H
+
+#include "parse-options.h"
+
+/*
+ * Common declarations and utilities for filtering objects (such as omitting
+ * large blobs) during fetch-pack, upload-pack, and the pack-protocol.  These
+ * are intended for partial/narrow clone/fetch.
+ */
+
+struct object_filter_options {
+	/*
+	 * blob-ish path or value that get_sha1_with_context() can turn into
+	 * an OID to find the blob containing the sparse-checkout specification.
+	 * only used when use_sparse is set.
+	 */
+	const char *sparse_value;
+	struct object_id sparse_oid;
+
+	/*
+	 * blob size byte limit for filtering.  only blobs smaller than this
+	 * value will be included.  a value of zero, omits all blobs.
+	 * only used when omit_large_blobs is set.  Integer and string versions
+	 * of this are kept for convenience.
+	 */
+	unsigned long large_byte_limit;
+	const char *large_byte_limit_string;
+
+	/* valid filter types (only one may be used at a time) */
+	unsigned omit_all_blobs : 1;
+	unsigned omit_large_blobs : 1;
+	unsigned use_sparse : 1;
+
+	/* true if the filter should output a manifest of the omitted objects. */
+	unsigned print_manifest : 1;
+
+	/* true to suppress missing object errors during consistency checks */
+	unsigned relax : 1;
+};
+
+/*
+ * Return true if a filter is enabled.
+ */
+inline int object_filter_enabled(const struct object_filter_options *p)
+{
+	return p->omit_all_blobs || p->omit_large_blobs || p->use_sparse;
+}
+
+/* See Documentation/technical/protocol-capabilities.txt */
+#define PROTOCOL_CAPABILITY_FILTER_OBJECTS         "filter-objects"
+
+/* See Documentation/technical/pack-protocol.txt */
+#define PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS     "filter-omit-all-blobs"
+#define PROTOCOL_REQUEST_FILTER_OMIT_LARGE_BLOBS   "filter-omit-large-blobs"
+#define PROTOCOL_REQUEST_FILTER_USE_SPARSE         "filter-use-sparse"
+
+/* Normalized command line arguments */
+#define CL_ARG_FILTER_OMIT_ALL_BLOBS     "filter-omit-all-blobs"
+#define CL_ARG_FILTER_OMIT_LARGE_BLOBS   "filter-omit-large-blobs"
+#define CL_ARG_FILTER_USE_SPARSE         "filter-use-sparse"
+#define CL_ARG_FILTER_PRINT_MANIFEST     "filter-print-manifest"
+#define CL_ARG_FILTER_RELAX              "filter-relax"
+
+/*
+ * Common command line argument parsing for object-filter-related
+ * arguments (whether from a hand-parsed or parse-options style
+ * parser.
+ */
+int parse_filter_omit_all_blobs(struct object_filter_options *filter_options);
+int parse_filter_omit_large_blobs(struct object_filter_options *filter_options,
+				  const char *arg);
+int parse_filter_use_sparse(struct object_filter_options *filter_options,
+			    const char *arg);
+int parse_filter_print_manifest(struct object_filter_options *filter_options);
+int parse_filter_relax(struct object_filter_options *filter_options);
+
+/*
+ * Common command line argument parsers for object-filter-related
+ * arguments comming from parse-options style parsers.
+ */
+
+int opt_parse_filter_omit_all_blobs(const struct option *opt,
+				    const char *arg, int unset);
+int opt_parse_filter_omit_large_blobs(const struct option *opt,
+				      const char *arg, int unset);
+int opt_parse_filter_use_sparse(const struct option *opt,
+				const char *arg, int unset);
+int opt_parse_filter_print_manifest(const struct option *opt,
+				    const char *arg, int unset);
+int opt_parse_filter_relax(const struct option *opt,
+			   const char *arg, int unset);
+
+#define OPT_PARSE_FILTER_OMIT_ALL_BLOBS(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_OMIT_ALL_BLOBS, fo, NULL, \
+	  N_("omit all blobs from result"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, \
+	  opt_parse_filter_omit_all_blobs }
+
+#define OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_OMIT_LARGE_BLOBS, fo, N_("size"), \
+	  N_("omit large blobs from result"), PARSE_OPT_NONEG, \
+	  opt_parse_filter_omit_large_blobs }
+
+#define OPT_PARSE_FILTER_USE_SPARSE(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_USE_SPARSE, fo, N_("object"), \
+	  N_("filter results using sparse-checkout specification"), PARSE_OPT_NONEG, \
+	  opt_parse_filter_use_sparse }
+
+#define OPT_PARSE_FILTER_PRINT_MANIFEST(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_PRINT_MANIFEST, fo, NULL,	\
+	  N_("print manifest of omitted objects"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, \
+	  opt_parse_filter_print_manifest }
+
+#define OPT_PARSE_FILTER_RELAX(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_RELAX, fo, NULL, \
+	  N_("relax consistency checks for previously omitted objects"), \
+	  PARSE_OPT_NOARG | PARSE_OPT_NONEG, opt_parse_filter_relax }
+
+/*
+ * Hand parse known object-filter command line options.
+ * Use this when the caller DOES NOT use the normal OPT_
+ * routines.
+ *
+ * Here we assume args of the form "--<key>" or "--<key>=<value>".
+ * Note the literal dash-dash and equals.
+ *
+ * Returns 1 if we handled the argument.
+ */
+int object_filter_hand_parse_arg(struct object_filter_options *filter_options,
+				 const char *arg,
+				 int allow_print_manifest,
+				 int allow_relax);
+
+/*
+ * Hand parse known object-filter protocol lines.
+ *
+ * Here we assume args of the form "<key>" or "<key> <value>".
+ * Note the literal space before between the key and value.
+ */ 
+int object_filter_hand_parse_protocol(struct object_filter_options *filter_options,
+				      const char *arg,
+				      int allow_print_manifest,
+				      int allow_relax);
+
+#endif /* OBJECT_FILTER_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/19] rev-list: add object filtering support
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (6 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach rev-list to use the filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.

This feature is only enabled when one of the "--objects*"
options are used.

When the "--filter-print-manifest" option is used, the
omitted objects and their sizes are printed at the end.
These are marked with a "~".  This can be combined with
"--quiet" to get a list of just the omitted objects.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/rev-list.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bcf77f0..fd9a7e5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
 #include "pack.h"
 #include "pack-bitmap.h"
 #include "builtin.h"
@@ -52,6 +54,7 @@ static const char rev_list_usage[] =
 
 static struct progress *progress;
 static unsigned progress_counter;
+static struct object_filter_options filter_options;
 
 static void finish_commit(struct commit *commit, void *data);
 static void show_commit(struct commit *commit, void *data)
@@ -178,8 +181,20 @@ static void finish_commit(struct commit *commit, void *data)
 static void finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+		if (filter_options.relax) {
+			/*
+			 * Relax consistency checks to not complain about
+			 * omitted objects (presumably caused by use of
+			 * the previous use of the 'filter-objects' feature).
+			 *
+			 * Note that this is independent of any filtering that
+			 * we are doing in this run.
+			 */
+			return;
+		}
 		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
 		parse_object(obj->oid.hash);
 }
@@ -199,6 +214,16 @@ static void show_edge(struct commit *commit)
 	printf("-%s\n", oid_to_hex(&commit->object.oid));
 }
 
+static void print_omitted_object(int i, int i_limit, struct oidset2_entry *e, void *cb_data)
+{
+	/* struct rev_list_info *info = cb_data; */
+
+	if (e->object_length == -1)
+		printf("~%s\n", oid_to_hex(&e->oid));
+	else
+		printf("~%s %"PRIuMAX"\n", oid_to_hex(&e->oid), e->object_length);
+}
+
 static void print_var_str(const char *var, const char *val)
 {
 	printf("%s='%s'\n", var, val);
@@ -276,6 +301,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	int bisect_find_all = 0;
 	int use_bitmap_index = 0;
 	const char *show_progress = NULL;
+	oidset2_foreach_cb fn_filter_print = NULL;
 
 	git_config(git_default_config, NULL);
 	init_revisions(&revs, prefix);
@@ -329,6 +355,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
+		if (object_filter_hand_parse_arg(&filter_options, arg, 1, 1)) {
+			if (!revs.blob_objects)
+				die(_("object filtering requires --objects"));
+			if (filter_options.use_sparse &&
+			    !oidcmp(&filter_options.sparse_oid, &null_oid))
+				die(_("invalid sparse value"));
+			continue;
+		}
 		usage(rev_list_usage);
 
 	}
@@ -353,6 +387,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
+	if (object_filter_enabled(&filter_options)) {
+		if (use_bitmap_index)
+			die(_("cannot combine --use-bitmap-index with object filtering"));
+	}
+
 	save_commit_buffer = (revs.verbose_header ||
 			      revs.grep_filter.pattern_list ||
 			      revs.grep_filter.header_list);
@@ -397,7 +436,22 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
-	traverse_commit_list(&revs, show_commit, show_object, &info);
+	if (filter_options.print_manifest)
+		fn_filter_print = print_omitted_object;
+
+	if (filter_options.omit_all_blobs)
+		traverse_commit_list_omit_all_blobs(
+			&revs, show_commit, show_object, fn_filter_print, &info);
+	else if (filter_options.omit_large_blobs)
+		traverse_commit_list_omit_large_blobs(
+			&revs, show_commit, show_object, fn_filter_print, &info,
+			(int64_t)(uint64_t)filter_options.large_byte_limit);
+	else if (filter_options.use_sparse)
+		traverse_commit_list_use_sparse(
+			&revs, show_commit, show_object, fn_filter_print, &info,
+			&filter_options.sparse_oid);
+	else
+		traverse_commit_list(&revs, show_commit, show_object, &info);
 
 	stop_progress(&progress);
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/19] rev-list: add filtering help text
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (7 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git-rev-list.txt     |  7 ++++++-
 Documentation/rev-list-options.txt | 26 ++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index ef22f17..d20c2ab 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -47,7 +47,12 @@ SYNOPSIS
 	     [ --fixed-strings | -F ]
 	     [ --date=<format>]
 	     [ [ --objects | --objects-edge | --objects-edge-aggressive ]
-	       [ --unpacked ] ]
+	       [ --unpacked ]
+	       [ [ --filter-omit-all-blobs |
+		   --filter-omit-large-blobs=<n>[kmg] |
+		   --filter-use-sparse=<object> ]
+		 [ --filter-print-manifest ] ] ]
+	     [ --filter-relax ]
 	     [ --pretty | --header ]
 	     [ --bisect ]
 	     [ --bisect-vars ]
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a02f732..e0112dd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -693,6 +693,32 @@ ifdef::git-rev-list[]
 --unpacked::
 	Only useful with `--objects`; print the object IDs that are not
 	in packs.
+
+--filter-omit-all-blobs::
+	Only useful with one of the `--objects*`; omits all blobs from
+	the printed list of objects.
+
+--filter-omit-large-blobs=<n>[kmg]::
+	Only useful with one of the `--objects*`; omits blobs larger than
+	n bytes from the printed list of objects.  May optionally be
+	followed by 'k', 'm', or 'g' units.  Value may be zero.  Special
+	files (matching ".git*") are always included, regardless of size.
+
+--filter-use-sparse=<object>::
+	Only useful with one of the `--objects*`; uses a sparse-checkout
+	specification contained in the given object to filter the result
+	by omitting blobs that would not be used by the corresponding
+	sparse checkout.
+
+--filter-print-manifest::
+	Only useful with one of the above `--filter*`; prints a manifest
+	of the omitted objects.  Object IDs are prefixed with a ``~''
+	character.  The object size is printed after the ID.
+
+--filter-relax::
+	Relax consistency checking for missing blobs.  Do not warn of
+	missing blobs during normal (non-filtering) object traversal
+	following an earlier partial/narrow clone or fetch.
 endif::git-rev-list[]
 
 --no-walk[=(sorted|unsorted)]::
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/19] t6112: rev-list object filtering test
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (8 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t6112-rev-list-filters-objects.sh | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 t/t6112-rev-list-filters-objects.sh

diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
new file mode 100644
index 0000000..ded2b04
--- /dev/null
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='git rev-list with object filtering'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	for n in 1 2 3 4 5 ; do \
+		echo $n > file.$n ; \
+		git add file.$n ; \
+		git commit -m "$n" ; \
+	done
+'
+
+test_expect_success 'omit-all-blobs omitted 5 blobs' '
+	git rev-list HEAD --objects --filter-print-manifest --filter-omit-all-blobs >omit_all &&
+	grep "^~" omit_all >omitted &&
+	test $(cat omitted | wc -l) = 5
+'
+
+test_expect_success 'omit-all-blobs blob sha match' '
+	git rev-list HEAD --objects >normal &&
+	awk "/file/ {print \$1;}" <normal | sort >normal_sha &&
+	sed "s/~//" <omitted | awk "{print \$1;}" | sort >omit_all_sha &&
+	test_cmp normal_sha omit_all_sha
+'
+
+test_expect_success 'omit-all-blobs nothing else changed' '
+	grep -v "file" <normal | sort >normal_other &&
+	grep -v "~" <omit_all | sort >omit_other &&
+	test_cmp normal_other omit_other
+'
+
+# TODO test filter-omit-large-blobs
+# TODO test filter-use-sparse
+
+test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/19] pack-objects: add object filtering support
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (9 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach pack-objects to use filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.

This feature is intended for narrow/partial clone/fetch.

Filtering requires use of "--stdout" option.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/pack-objects.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 50e01aa..614ad60 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,6 +14,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
 #include "pack-objects.h"
 #include "progress.h"
 #include "refs.h"
@@ -77,6 +79,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static struct object_filter_options filter_options;
+
 /*
  * stats
  */
@@ -2800,7 +2804,20 @@ static void get_object_list(int ac, const char **av)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	mark_edges_uninteresting(&revs, show_edge);
-	traverse_commit_list(&revs, show_commit, show_object, NULL);
+
+	if (filter_options.omit_all_blobs)
+		traverse_commit_list_omit_all_blobs(
+			&revs, show_commit, show_object, NULL, NULL);
+	else if (filter_options.omit_large_blobs)
+		traverse_commit_list_omit_large_blobs(
+			&revs, show_commit, show_object, NULL, NULL,
+			(int64_t)(uint64_t)filter_options.large_byte_limit);
+	else if (filter_options.use_sparse)
+		traverse_commit_list_use_sparse(
+			&revs, show_commit, show_object, NULL, NULL,
+			&filter_options.sparse_oid);
+	else
+		traverse_commit_list(&revs, show_commit, show_object, NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
@@ -2936,6 +2953,14 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("use a bitmap index if available to speed up counting objects")),
 		OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
 			 N_("write a bitmap index together with the pack index")),
+
+		OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+		OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+		OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+		/* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+		/* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
 		OPT_END(),
 	};
 
@@ -3007,6 +3032,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!pack_to_stdout && thin)
 		die("--thin cannot be used to build an indexable pack.");
 
+	if (!pack_to_stdout && object_filter_enabled(&filter_options))
+		die("object filtering cannot be used when building an indexable pack.");
+
 	if (keep_unreachable && unpack_unreachable)
 		die("--keep-unreachable and --unpack-unreachable are incompatible.");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
@@ -3031,6 +3059,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow())
 		use_bitmap_index = 0;
 
+	if (object_filter_enabled(&filter_options))
+		use_bitmap_index = 0;
+
 	if (pack_to_stdout || !rev_list_all)
 		write_bitmap_index = 0;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/19] pack-objects: add filtering help text
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (10 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Update pack-objects help text to describe object filtering.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git-pack-objects.txt | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 8973510..084641f 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -231,6 +231,20 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
 	With this option, parents that are hidden by grafts are packed
 	nevertheless.
 
+--filter-omit-all-blobs::
+	Omits all blobs from the packfile.  This option requires --stdout.
+
+--filter-omit-large-blobs=<n>[kmg]::
+	Omits blobs larger than	n bytes from packfile.  May optionally be
+	followed by 'k', 'm', or 'g' units.  Value may be zero.  Special
+	files (matching ".git*") are always included, regardless of size.
+	This option requires --stdout.
+
+--filter-use-sparse=<object>::
+	Uses a sparse-checkout specification given by <object> to filter
+	the result by omitting blobs that would not be used by the
+	corresponding sparse checkout.  This option requires --stdout.
+
 SEE ALSO
 --------
 linkgit:git-rev-list[1]
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (11 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/pack-protocol.txt         | 16 ++++++++++++++++
 Documentation/technical/protocol-capabilities.txt |  7 +++++++
 2 files changed, 23 insertions(+)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index a349171..dce6e04 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       [filter-request]
 		       flush-pkt
 
   want-list         =  first-want
@@ -226,7 +227,13 @@ out of what the server said it could do with the first 'want' line.
   first-want        =  PKT-LINE("want" SP obj-id SP capability-list)
   additional-want   =  PKT-LINE("want" SP obj-id)
 
+  filter-request    =  PKT-LINE("filter-omit-all-blobs") /
+		       PKT-LINE("filter-omit-large-blobs" SP magnitude) /
+		       PKT-LINE("filter-use-sparse" SP obj-id)
+
   depth             =  1*DIGIT
+
+  magnitude         =  1*DIGIT [ "k" | "m" | "g" ]
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +256,15 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial/narrow clone/fetch operations.
+"filter-omit-all-blobs" requests that all blobs be omitted from
+the packfile.  "filter-omit-large-blobs" requests that blobs larger
+than the requested size be omitted, unless they have a ".git*"
+special filename.  "filter-use-sparse" requests blob filtering based
+upon a sparse-checkout specification in the given blob id.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..7011eb3 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter-objects
+--------------
+
+If the upload-pack server advertises the 'filter-objects' capability,
+fetch-pack may send "filter-*" commands to request a partial/narrow
+clone/fetch where the server omits various objects from the packfile.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/19] upload-pack: add object filtering
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (12 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 upload-pack.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/upload-pack.c b/upload-pack.c
index ffb028d..c709054 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -17,6 +17,7 @@
 #include "parse-options.h"
 #include "argv-array.h"
 #include "prio-queue.h"
+#include "object-filter.h"
 
 static const char * const upload_pack_usage[] = {
 	N_("git upload-pack [<options>] <dir>"),
@@ -63,6 +64,9 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static int capability_filter_objects_requested;
+static struct object_filter_options filter_options;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -131,6 +135,30 @@ static void create_pack_file(void)
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
 
+	if (filter_options.omit_all_blobs)
+		argv_array_push(&pack_objects.args,
+				("--" CL_ARG_FILTER_OMIT_ALL_BLOBS));
+	else if (filter_options.omit_large_blobs) {
+		if (filter_options.large_byte_limit_string)
+			argv_array_pushf(&pack_objects.args, "--%s=%s",
+					 CL_ARG_FILTER_OMIT_LARGE_BLOBS,
+					 filter_options.large_byte_limit_string);
+		else
+			argv_array_pushf(&pack_objects.args, "--%s=%ld",
+					 CL_ARG_FILTER_OMIT_LARGE_BLOBS,
+					 filter_options.large_byte_limit);
+	}
+	else if (filter_options.use_sparse) {
+		if (!oidcmp(&filter_options.sparse_oid, &null_oid))
+			argv_array_pushf(&pack_objects.args, "--%s=%s",
+					 CL_ARG_FILTER_USE_SPARSE,
+					 oid_to_hex(&filter_options.sparse_oid));
+		else
+			argv_array_pushf(&pack_objects.args, "--%s=%s",
+					 CL_ARG_FILTER_USE_SPARSE,
+					 filter_options.sparse_value);
+	}
+
 	pack_objects.in = -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
@@ -793,6 +821,12 @@ static void receive_needs(void)
 			deepen_rev_list = 1;
 			continue;
 		}
+		if (object_filter_hand_parse_protocol(&filter_options, line, 0, 0)) {
+			if (!capability_filter_objects_requested)
+				die("git upload-pack: object filtering requires '%s' capability",
+				    PROTOCOL_CAPABILITY_FILTER_OBJECTS);
+			continue;
+		}
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_sha1_hex(arg, sha1_buf))
 			die("git upload-pack: protocol error, "
@@ -820,6 +854,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, PROTOCOL_CAPABILITY_FILTER_OBJECTS))
+			capability_filter_objects_requested = 1;
 
 		o = parse_object(sha1_buf);
 		if (!o) {
@@ -928,7 +964,8 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow deepen-since deepen-not"
-		" deepen-relative no-progress include-tag multi_ack_detailed";
+		" deepen-relative no-progress include-tag multi_ack_detailed"
+		" " PROTOCOL_CAPABILITY_FILTER_OBJECTS;
 	const char *refname_nons = strip_namespace(refname);
 	struct object_id peeled;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 15/19] fetch-pack: add object filtering support
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (13 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch-pack.c |  3 +++
 fetch-pack.c         | 28 ++++++++++++++++++++++++++++
 fetch-pack.h         |  2 ++
 transport.c          | 27 +++++++++++++++++++++++++++
 transport.h          |  8 ++++++++
 5 files changed, 68 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d1..72f9063 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.update_shallow = 1;
 			continue;
 		}
+		if (object_filter_hand_parse_arg(&args.filter_options, arg, 0, 0)) {
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b05..642077d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -374,6 +374,8 @@ static int find_common(struct fetch_pack_args *args,
 			if (prefer_ofs_delta)   strbuf_addstr(&c, " ofs-delta");
 			if (deepen_since_ok)    strbuf_addstr(&c, " deepen-since");
 			if (deepen_not_ok)      strbuf_addstr(&c, " deepen-not");
+			if (object_filter_enabled(&args->filter_options))
+				strbuf_addstr(&c, (" " PROTOCOL_CAPABILITY_FILTER_OBJECTS));
 			if (agent_supported)    strbuf_addf(&c, " agent=%s",
 							    git_user_agent_sanitized());
 			packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
@@ -404,6 +406,18 @@ static int find_common(struct fetch_pack_args *args,
 			packet_buf_write(&req_buf, "deepen-not %s", s->string);
 		}
 	}
+
+	if (args->filter_options.omit_all_blobs)
+		packet_buf_write(&req_buf, PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS);
+	else if (args->filter_options.omit_large_blobs)
+		packet_buf_write(&req_buf,
+				 PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS " %ld",
+				 args->filter_options.large_byte_limit);
+	else if (args->filter_options.use_sparse)
+		packet_buf_write(&req_buf,
+				 PROTOCOL_REQUEST_FILTER_USE_SPARSE " %s",
+				 args->filter_options.sparse_value);
+
 	packet_buf_flush(&req_buf);
 	state_len = req_buf.len;
 
@@ -811,6 +825,15 @@ static int get_pack(struct fetch_pack_args *args,
 					"--keep=fetch-pack %"PRIuMAX " on %s",
 					(uintmax_t)getpid(), hostname);
 		}
+
+		/*
+		 * Relax consistency check to allow missing blobs (presumably
+		 * because they are exactly the set that we requested be
+		 * omitted.
+		 */
+		if (object_filter_enabled(&args->filter_options))
+			argv_array_push(&cmd.args, ("--" CL_ARG_FILTER_RELAX));
+
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
 	}
@@ -924,6 +947,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	else
 		prefer_ofs_delta = 0;
 
+	if (server_supports(PROTOCOL_CAPABILITY_FILTER_OBJECTS))
+		print_verbose(args, _("Server supports "PROTOCOL_CAPABILITY_FILTER_OBJECTS));
+	else if (object_filter_enabled(&args->filter_options))
+		die(_("Server does not support "PROTOCOL_CAPABILITY_FILTER_OBJECTS));
+
 	if ((agent_feature = server_feature_value("agent", &agent_len))) {
 		agent_supported = 1;
 		if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index b6aeb43..5e6bf3b 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "object-filter.h"
 
 struct oid_array;
 
@@ -12,6 +13,7 @@ struct fetch_pack_args {
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
+	struct object_filter_options filter_options;
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
diff --git a/transport.c b/transport.c
index 4d33138..7abf0b6 100644
--- a/transport.c
+++ b/transport.c
@@ -160,6 +160,32 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FILTER_OMIT_ALL_BLOBS)) {
+		opts->filter_options.omit_all_blobs = !!value;
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FILTER_OMIT_LARGE_BLOBS)) {
+		opts->filter_options.omit_large_blobs = 1;
+		opts->filter_options.large_byte_limit_string = value;
+		if (!value)
+			opts->filter_options.large_byte_limit = 0;
+		else if (!git_parse_ulong(value,
+					  &opts->filter_options.large_byte_limit))
+			die(_("transport: invalid filter value '%s'"), value);
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FILTER_USE_SPARSE)) {
+		opts->filter_options.use_sparse = 1;
+		opts->filter_options.sparse_value = value;
+		/*
+		 * We're constrained by the API for this set_ operation and
+		 * only take a single value.  We don't want to do the get_sha1*()
+		 * lookup (possibly for the second time), because the caller
+		 * should already know and normalized the hex OID string
+		 * (assuming that it used the normal parsing methods).  So we
+		 * assume that the above string value is sufficient here and
+		 * can just NULL the binary OID field.
+		 */
+		oidcpy(&opts->filter_options.sparse_oid, &null_oid);
+		return 0;
 	}
 	return 1;
 }
@@ -228,6 +254,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
+	args.filter_options = data->options.filter_options;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index bc55715..490f827 100644
--- a/transport.h
+++ b/transport.h
@@ -4,6 +4,8 @@
 #include "cache.h"
 #include "run-command.h"
 #include "remote.h"
+#include "fetch-pack.h"
+#include "object-filter.h"
 
 struct string_list;
 
@@ -21,6 +23,7 @@ struct git_transport_options {
 	const char *uploadpack;
 	const char *receivepack;
 	struct push_cas_option *cas;
+	struct object_filter_options filter_options;
 };
 
 enum transport_family {
@@ -210,6 +213,11 @@ void transport_check_allowed(const char *type);
 /* Send push certificates */
 #define TRANS_OPT_PUSH_CERT "pushcert"
 
+/* See Documentation/technical/pack-protocol.txt */
+#define TRANS_OPT_FILTER_OMIT_ALL_BLOBS   "filter-omit-all-blobs"
+#define TRANS_OPT_FILTER_OMIT_LARGE_BLOBS "filter-omit-large-blobs"
+#define TRANS_OPT_FILTER_USE_SPARSE       "filter-use-sparse"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 16/19] connected: add filter_allow_omitted option to API
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (14 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 connected.c | 3 +++
 connected.h | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/connected.c b/connected.c
index 136c2ac..c25b816 100644
--- a/connected.c
+++ b/connected.c
@@ -62,6 +62,9 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 		argv_array_pushf(&rev_list.args, "--progress=%s",
 				 _("Checking connectivity"));
 
+	if (opt->filter_relax)
+		argv_array_push(&rev_list.args, ("--" CL_ARG_FILTER_RELAX));
+
 	rev_list.git_cmd = 1;
 	rev_list.env = opt->env;
 	rev_list.in = -1;
diff --git a/connected.h b/connected.h
index 4ca325f..370710e 100644
--- a/connected.h
+++ b/connected.h
@@ -35,6 +35,12 @@ struct check_connected_options {
 	int progress;
 
 	/*
+	 * Relax consistency checks for missing blobs (presumably
+	 * due to the use 'filter-objects' feature).
+	 */
+	int filter_relax;
+
+	/*
 	 * Insert these variables into the environment of the child process.
 	 */
 	const char **env;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 17/19] clone: add filter arguments
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (15 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/clone.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index a6ae7d6..1408396 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -24,6 +24,7 @@
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
+#include "object-filter.h"
 
 /*
  * Overall FIXMEs:
@@ -57,6 +58,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct object_filter_options filter_options;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -130,6 +132,14 @@ static struct option builtin_clone_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+
+	OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+	/* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+	/* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
 	OPT_END()
 };
 
@@ -643,6 +653,13 @@ static void update_remote_refs(const struct ref *refs,
 	if (check_connectivity) {
 		struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
+		/*
+		 * Relax consistency check to allow missing blobs (presumably
+		 * because they are exactly the set that we requested be
+		 * omitted.
+		 */
+		opt.filter_relax = object_filter_enabled(&filter_options);
+
 		opt.transport = transport;
 		opt.progress = transport->progress;
 
@@ -1059,6 +1076,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			warning(_("--shallow-since is ignored in local clones; use file:// instead."));
 		if (option_not.nr)
 			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
+		if (object_filter_enabled(&filter_options))
+			warning(_("--filter-* options are ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
@@ -1090,6 +1109,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
+	if (filter_options.omit_all_blobs)
+		transport_set_option(transport, TRANS_OPT_FILTER_OMIT_ALL_BLOBS, "1");
+	if (filter_options.omit_large_blobs)
+		transport_set_option(transport, TRANS_OPT_FILTER_OMIT_LARGE_BLOBS,
+				     filter_options.large_byte_limit_string);
+	if (filter_options.use_sparse)
+		transport_set_option(transport, TRANS_OPT_FILTER_USE_SPARSE,
+				     filter_options.sparse_value);
+
 	if (transport->smart_options && !deepen)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (16 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/index-pack.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4ff567d..30ff409 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -11,6 +11,7 @@
 #include "exec_cmd.h"
 #include "streaming.h"
 #include "thread-utils.h"
+#include "object-filter.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -80,6 +81,7 @@ static int verbose;
 static int show_resolving_progress;
 static int show_stat;
 static int check_self_contained_and_connected;
+static int filter_relax;
 
 static struct progress *progress;
 
@@ -220,6 +222,17 @@ static unsigned check_object(struct object *obj)
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
 		int type = sha1_object_info(obj->oid.hash, &size);
+
+		if (type <= 0 && filter_relax) {
+			/*
+			 * Relax consistency checks to not complain about
+			 * omitted objects (presumably caused by use of
+			 * the 'filter-objects' feature).
+			 */
+			obj->flags |= FLAG_CHECKED;
+			return 0;
+		}
+
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
@@ -1721,6 +1734,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
 				max_input_size = strtoumax(arg, NULL, 10);
+			} else if (!strcmp(arg, ("--"CL_ARG_FILTER_RELAX))) {
+				filter_relax = 1;
 			} else
 				usage(index_pack_usage);
 			continue;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 19/19] fetch: add object filtering to fetch
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (17 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5f2c2ab..306c165 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -16,6 +16,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "object-filter.h"
 
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -52,6 +53,7 @@ static const char *recurse_submodules_default;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct object_filter_options filter_options;
 
 static int option_parse_recurse_submodules(const struct option *opt,
 				   const char *arg, int unset)
@@ -141,6 +143,14 @@ static struct option builtin_fetch_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+
+	OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+	/* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+	/* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
 	OPT_END()
 };
 
@@ -733,6 +743,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
+	struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+	/*
+	 * Relax consistency check to allow missing blobs (presumably
+	 * because they are exactly the set that we requested be
+	 * omitted.
+	 */
+	opt.filter_relax = object_filter_enabled(&filter_options);
 
 	fp = fopen(filename, "a");
 	if (!fp)
@@ -744,7 +762,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	rm = ref_map;
-	if (check_connected(iterate_ref_map, &rm, NULL)) {
+	if (check_connected(iterate_ref_map, &rm, &opt)) {
 		rc = error(_("%s did not send all necessary objects\n"), url);
 		goto abort;
 	}
@@ -885,6 +903,13 @@ static int quickfetch(struct ref *ref_map)
 	struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
 	/*
+	 * Relax consistency check to allow missing blobs (presumably
+	 * because they are exactly the set that we requested be
+	 * omitted.
+	 */
+	opt.filter_relax = object_filter_enabled(&filter_options);
+
+	/*
 	 * If we are deepening a shallow clone we already have these
 	 * objects reachable.  Running rev-list here will return with
 	 * a good (0) exit status and we'll bypass the fetch that we
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-07-13 17:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).