All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] WIP object filtering for partial clone
@ 2017-07-13 17:34 Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

This WIP is a follow up to my earlier patch series to teach
pack-objects to omit large blobs from packfiles. [1]

Like the previous version, this version builds upon a suggestion from
Peff [2] to use the traverse_commit_list() machinery to allow custom
object filtering using a filter callback.  This hides the filtering
logic in list-objects.c and list-objects-filters.c and minimizes the
changes to actual commands, such as pack-objects.

This version adds that same filtering capability to rev-list allowing
filtering to be demonstrated without building a packfile.  Filtered
blobs are printed with a leading "~" (along with their sizes).

    $ ./git rev-list --objects HEAD~1..HEAD
    74f806c70507317b8bdbcf3b08459c7c83906bee
    818617707aac81ae4620239182b514f65638e37e 
    d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
    306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c

    $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD
    74f806c70507317b8bdbcf3b08459c7c83906bee
    818617707aac81ae4620239182b514f65638e37e 
    d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
    ~306c16551e548ace12c709a332bfea22adcc395f 40732

    $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD
    ~306c16551e548ace12c709a332bfea22adcc395f 40732

This version contains 3 filters:
1. filter-omit-all-blobs to exclude all blobs (trees and commits only).

2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n>
   (but always including ".git*" special files).

3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the
   corresponding sparse-checkout.

Sparse-checkout filtering is currently limited to filtering unneeded blobs.
A later enhancement should be able to also filter unneeded tree objects.

This version updates clone, fetch, fetch-pack, and upload-pack commands
to pass the additional object-filter parameters.

As a (possibly) temporary measure, some commands have been updated to
relax missing blob errors during consistency checks.  Maintining info
on missing blobs is currently being discussed in [3].

TODO
1. Incorporate with a patch series like [4] to dynamically fetch a
   missing blob from the server in read_object on demand.
2. Resolve missing blob consistency check issue.
3. Store filter options from clone in config or .git/info and default
   to them in subsequent fetches.
4. fsck, gc, and assorted commands.
5. testing.


[1] https://public-inbox.org/git/20170622203615.34135-1-git@jeffhostetler.com/
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
[4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/


Jeff Hostetler (19):
  dir: refactor add_excludes()
  oidset2: create oidset subclass with object length and pathname
  list-objects: filter objects in traverse_commit_list
  list-objects-filters: add omit-all-blobs filter
  list-objects-filters: add omit-large-blobs filter
  list-objects-filters: add use-sparse-checkout filter
  object-filter: common declarations for object filtering
  rev-list: add object filtering support
  rev-list: add filtering help text
  t6112: rev-list object filtering test
  pack-objects: add object filtering support
  pack-objects: add filtering help text
  upload-pack: add filter-objects to protocol documentation
  upload-pack: add object filtering
  fetch-pack: add object filtering support
  connected: add filter_allow_omitted option to API
  clone: add filter arguments
  index-pack: relax consistency checks for omitted objects
  fetch: add object filtering to fetch

 Documentation/git-pack-objects.txt                |  14 +
 Documentation/git-rev-list.txt                    |   7 +-
 Documentation/rev-list-options.txt                |  26 ++
 Documentation/technical/pack-protocol.txt         |  16 +
 Documentation/technical/protocol-capabilities.txt |   7 +
 Makefile                                          |   3 +
 builtin/clone.c                                   |  28 ++
 builtin/fetch-pack.c                              |   3 +
 builtin/fetch.c                                   |  27 +-
 builtin/index-pack.c                              |  15 +
 builtin/pack-objects.c                            |  33 +-
 builtin/rev-list.c                                |  58 +++-
 connected.c                                       |   3 +
 connected.h                                       |   6 +
 dir.c                                             |  53 +++-
 dir.h                                             |   4 +
 fetch-pack.c                                      |  28 ++
 fetch-pack.h                                      |   2 +
 list-objects-filters.c                            | 361 ++++++++++++++++++++++
 list-objects-filters.h                            |  45 +++
 list-objects.c                                    |  66 +++-
 list-objects.h                                    |  30 ++
 object-filter.c                                   | 201 ++++++++++++
 object-filter.h                                   | 145 +++++++++
 oidset2.c                                         | 101 ++++++
 oidset2.h                                         |  56 ++++
 t/t6112-rev-list-filters-objects.sh               |  37 +++
 transport.c                                       |  27 ++
 transport.h                                       |   8 +
 upload-pack.c                                     |  39 ++-
 30 files changed, 1425 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filters.c
 create mode 100644 list-objects-filters.h
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h
 create mode 100644 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/19] dir: refactor add_excludes()
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Refactor add_excludes() to separate the reading of the
exclude file into a buffer and the parsing of the buffer
into exclude_list items.

Add add_excludes_from_blob_to_list() to allow an exclude
file be specified with an OID.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 dir.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 dir.h |  4 ++++
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 31f9343..aeba965 100644
--- a/dir.c
+++ b/dir.c
@@ -725,6 +725,11 @@ static void invalidate_directory(struct untracked_cache *uc,
 		dir->dirs[i]->recurse = 0;
 }
 
+static int add_excludes_from_buffer(
+	char *buf, size_t size,
+	const char *base, int baselen,
+	struct exclude_list *el);
+
 /*
  * Given a file with name "fname", read it (either from disk, or from
  * the index if "check_index" is non-zero), parse it and store the
@@ -739,9 +744,9 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 			struct sha1_stat *sha1_stat)
 {
 	struct stat st;
-	int fd, i, lineno = 1;
+	int fd;
 	size_t size = 0;
-	char *buf, *entry;
+	char *buf;
 
 	fd = open(fname, O_RDONLY);
 	if (fd < 0 || fstat(fd, &st) < 0) {
@@ -798,6 +803,18 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 		}
 	}
 
+	add_excludes_from_buffer(buf, size, base, baselen, el);
+	return 0;
+}
+
+static int add_excludes_from_buffer(
+	char *buf, size_t size,
+	const char *base, int baselen,
+	struct exclude_list *el)
+{
+	int i, lineno = 1;
+	char *entry;
+
 	el->filebuf = buf;
 
 	if (skip_utf8_bom(&buf, size))
@@ -826,6 +843,38 @@ int add_excludes_from_file_to_list(const char *fname, const char *base,
 	return add_excludes(fname, base, baselen, el, check_index, NULL);
 }
 
+int add_excludes_from_blob_to_list(
+	struct object_id *oid,
+	const char *base, int baselen,
+	struct exclude_list *el)
+{
+	char *buf;
+	unsigned long size;
+	enum object_type type;
+
+	buf = read_sha1_file(oid->hash, &type, &size);
+	if (!buf)
+		return -1;
+
+	if (type != OBJ_BLOB) {
+		free(buf);
+		return -1;
+	}
+
+	if (size == 0) {
+		free(buf);
+		return 0;
+	}
+
+	if (buf[size - 1] != '\n') {
+		buf = xrealloc(buf, st_add(size, 1));
+		buf[size++] = '\n';
+	}
+
+	add_excludes_from_buffer(buf, size, base, baselen, el);
+	return 0;
+}
+
 struct exclude_list *add_exclude_list(struct dir_struct *dir,
 				      int group_type, const char *src)
 {
diff --git a/dir.h b/dir.h
index edb5fda..8e754e5 100644
--- a/dir.h
+++ b/dir.h
@@ -242,6 +242,10 @@ extern struct exclude_list *add_exclude_list(struct dir_struct *dir,
 extern int add_excludes_from_file_to_list(const char *fname, const char *base, int baselen,
 					  struct exclude_list *el, int check_index);
 extern void add_excludes_from_file(struct dir_struct *, const char *fname);
+extern int add_excludes_from_blob_to_list(
+	struct object_id *oid,
+	const char *base, int baselen,
+	struct exclude_list *el);
 extern void parse_exclude_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen);
 extern void add_exclude(const char *string, const char *base,
 			int baselen, struct exclude_list *el, int srcpos);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.

This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile  |   1 +
 oidset2.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 oidset2.h |  56 ++++++++++++++++++++++++++++++++++
 3 files changed, 158 insertions(+)
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h

diff --git a/Makefile b/Makefile
index ffa6da7..d590508 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += oidset2.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/oidset2.c b/oidset2.c
new file mode 100644
index 0000000..806d153
--- /dev/null
+++ b/oidset2.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "oidset2.h"
+
+static int oidset2_hashcmp(const void *va, const void *vb,
+			  const void *vkey)
+{
+	const struct oidset2_entry *a = va, *b = vb;
+	const struct object_id *key = vkey;
+	return oidcmp(&a->oid, key ? key : &b->oid);
+}
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid)
+{
+	struct hashmap_entry key;
+	struct oidset2_entry *value;
+
+	if (!set->map.cmpfn)
+		return NULL;
+
+	hashmap_entry_init(&key, sha1hash(oid->hash));
+	value = hashmap_get(&set->map, &key, oid);
+
+	return value;
+}
+
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid)
+{
+	return !!oidset2_get(set, oid);
+}
+
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+		   int64_t object_length, const char *pathname)
+{
+	struct oidset2_entry *entry;
+
+	if (!set->map.cmpfn)
+		hashmap_init(&set->map, oidset2_hashcmp, 0);
+
+	if (oidset2_contains(set, oid))
+		return 1;
+
+	entry = xcalloc(1, sizeof(*entry));
+	hashmap_entry_init(&entry->hash, sha1hash(oid->hash));
+	oidcpy(&entry->oid, oid);
+
+	entry->object_length = object_length;
+	if (pathname)
+	    entry->pathname = strdup(pathname);
+
+	hashmap_add(&set->map, entry);
+	return 0;
+}
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid)
+{
+	struct hashmap_entry key;
+	struct oidset2_entry *e;
+
+	hashmap_entry_init(&key, sha1hash(oid->hash));
+	e = hashmap_remove(&set->map, &key, oid);
+
+	free(e->pathname);
+	free(e);
+}
+
+void oidset2_clear(struct oidset2 *set)
+{
+	hashmap_free(&set->map, 1);
+}
+
+static int oidset2_cmp(const void *a, const void *b)
+{
+	const struct oidset2_entry *ae = *((const struct oidset2_entry **)a);
+	const struct oidset2_entry *be = *((const struct oidset2_entry **)b);
+
+	return oidcmp(&ae->oid, &be->oid);
+}
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data)
+{
+	struct hashmap_iter iter;
+	struct oidset2_entry **array;
+	struct oidset2_entry *e;
+	int j, k;
+
+	array = xcalloc(set->map.size, sizeof(*e));
+
+	hashmap_iter_init(&set->map, &iter);
+	k = 0;
+	while ((e = hashmap_iter_next(&iter)))
+		array[k++] = e;
+
+	QSORT(array, k, oidset2_cmp);
+
+	for (j = 0; j < k; j++) {
+		e = array[j];
+		cb(j, k, e, cb_data);
+	}
+
+	free(array);
+}
diff --git a/oidset2.h b/oidset2.h
new file mode 100644
index 0000000..c498eae
--- /dev/null
+++ b/oidset2.h
@@ -0,0 +1,56 @@
+#ifndef OIDSET2_H
+#define OIDSET2_H
+
+/**
+ * oidset2 is a variant of oidset, but allows additional fields for each object.
+ */
+
+/**
+ * A single oidset2; should be zero-initialized (or use OIDSET2_INIT).
+ */
+struct oidset2 {
+	struct hashmap map;
+};
+
+#define OIDSET2_INIT { { NULL } }
+
+struct oidset2_entry {
+	struct hashmap_entry hash;
+	struct object_id oid;
+
+	int64_t object_length;	/* This is SIGNED. Use -1 when unknown. */
+	char *pathname;
+};
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Returns true iff `set` contains `oid`.
+ */
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Insert the oid into the set; a copy is made, so "oid" does not need
+ * to persist after this function is called.
+ *
+ * Returns 1 if the oid was already in the set, 0 otherwise. This can be used
+ * to perform an efficient check-and-add.
+ */
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+		   int64_t object_length, const char *pathname);
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid);
+
+typedef void (*oidset2_foreach_cb)(
+	int i, int i_limit,
+	struct oidset2_entry *e, void *cb_data);
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data);
+
+/**
+ * Remove all entries from the oidset2, freeing any resources associated with
+ * it.
+ */
+void oidset2_clear(struct oidset2 *set);
+
+#endif /* OIDSET2_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted (not shown)
during a traversal.

Update traverse_commit_list() to be a wrapper for the above.

Filtering will be used in a future commit by rev-list and
pack-objects for narrow/partial clone/fetch to omit certain
blobs from the output.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--------------
 list-objects.h | 30 ++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..8dddeda 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
 			 show_object_fn show,
 			 struct strbuf *path,
 			 const char *name,
-			 void *cb_data)
+			 void *cb_data,
+			 filter_object_fn filter,
+			 void *filter_data)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
+	list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
 	if (!revs->blob_objects)
 		return;
@@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs,
 		die("bad blob object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	obj->flags |= SEEN;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	show(obj, path->buf, cb_data);
+	if (filter)
+		r = filter(LOFT_BLOB, obj, path->buf, &path->buf[pathlen], filter_data);
+	if (r & LOFR_MARK_SEEN)
+		obj->flags |= SEEN;
+	if (r & LOFR_SHOW)
+		show(obj, path->buf, cb_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs,
 			 show_object_fn show,
 			 struct strbuf *base,
 			 const char *name,
-			 void *cb_data)
+			 void *cb_data,
+			 filter_object_fn filter,
+			 void *filter_data)
 {
 	struct object *obj = &tree->object;
 	struct tree_desc desc;
@@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs,
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
 		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
+	list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
 	if (!revs->tree_objects)
 		return;
@@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs,
 		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
-	obj->flags |= SEEN;
 	strbuf_addstr(base, name);
-	show(obj, base->buf, cb_data);
+	if (filter)
+		r = filter(LOFT_BEGIN_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+	if (r & LOFR_MARK_SEEN)
+		obj->flags |= SEEN;
+	if (r & LOFR_SHOW)
+		show(obj, base->buf, cb_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs,
 			process_tree(revs,
 				     lookup_tree(entry.oid->hash),
 				     show, base, entry.path,
-				     cb_data);
+				     cb_data, filter, filter_data);
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(revs, entry.oid->hash,
 					show, base, entry.path,
@@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs,
 			process_blob(revs,
 				     lookup_blob(entry.oid->hash),
 				     show, base, entry.path,
-				     cb_data);
+				     cb_data, filter, filter_data);
 	}
+
+	if (filter) {
+		r = filter(LOFT_END_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+		if (r & LOFR_MARK_SEEN)
+			obj->flags |= SEEN;
+		if (r & LOFR_SHOW)
+			show(obj, base->buf, cb_data);
+	}
+
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
 }
@@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+void traverse_commit_list_filtered(
+	struct rev_info *revs,
+	show_commit_fn show_commit, show_object_fn show_object, void *show_data,
+	filter_object_fn filter, void *filter_data)
 {
 	int i;
 	struct commit *commit;
@@ -200,7 +223,7 @@ void traverse_commit_list(struct rev_info *revs,
 		 */
 		if (commit->tree)
 			add_pending_tree(revs, commit->tree);
-		show_commit(commit, data);
+		show_commit(commit, show_data);
 	}
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
@@ -211,19 +234,19 @@ void traverse_commit_list(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, data);
+			show_object(obj, name, show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     &base, path, data);
+				     &base, path, show_data, filter, filter_data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     &base, path, data);
+				     &base, path, show_data, filter, filter_data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
@@ -232,3 +255,14 @@ void traverse_commit_list(struct rev_info *revs,
 	object_array_clear(&revs->pending);
 	strbuf_release(&base);
 }
+
+void traverse_commit_list(struct rev_info *revs,
+			  show_commit_fn show_commit,
+			  show_object_fn show_object,
+			  void *show_data)
+{
+	traverse_commit_list_filtered(
+		revs,
+		show_commit, show_object, show_data,
+		NULL, NULL);
+}
diff --git a/list-objects.h b/list-objects.h
index 0cebf85..964e7d3 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -8,4 +8,34 @@ void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, voi
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
 
+enum list_objects_filter_result {
+	LOFR_ZERO      = 0,
+	LOFR_MARK_SEEN = 1<<0,
+	LOFR_SHOW      = 1<<1,
+};
+
+/* See object.h and revision.h */
+#define FILTER_REVISIT (1<<25)
+
+enum list_objects_filter_type {
+	LOFT_BEGIN_TREE,
+	LOFT_END_TREE,
+	LOFT_BLOB
+};
+
+typedef enum list_objects_filter_result list_objects_filter_result;
+typedef enum list_objects_filter_type list_objects_filter_type;
+
+typedef list_objects_filter_result (*filter_object_fn)(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data);
+
+void traverse_commit_list_filtered(
+	struct rev_info *,
+	show_commit_fn, show_object_fn, void *show_data,
+	filter_object_fn filter, void *filter_data);
+
 #endif
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (2 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a simple filter for traverse_commit_list_filtered() to
omit all blobs from the result.

This filter will be used in a future commit by rev-list and
pack-objects to create a "commits and trees" result.  This
is intended for a narrow/partial clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile               |  1 +
 list-objects-filters.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h | 17 ++++++++++
 3 files changed, 103 insertions(+)
 create mode 100644 list-objects-filters.c
 create mode 100644 list-objects-filters.h

diff --git a/Makefile b/Makefile
index d590508..48fdcf2 100644
--- a/Makefile
+++ b/Makefile
@@ -773,6 +773,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filters.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
diff --git a/list-objects-filters.c b/list-objects-filters.c
new file mode 100644
index 0000000..f29d8bc
--- /dev/null
+++ b/list-objects-filters.c
@@ -0,0 +1,85 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filters.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs from the traversal.
+ */
+struct filter_omit_all_blobs_data {
+	struct oidset2 omits;
+};
+
+static list_objects_filter_result filter_omit_all_blobs(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_omit_all_blobs_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	unsigned long s;
+	enum object_type t;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		/*
+		 * Since we always omit all blobs (and never provisionally omit),
+		 * we should never see a blob twice.
+		 */
+		assert(!oidset2_contains(&filter_data->omits, &obj->oid));
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		/* Insert OID into the omitted list. No need for a pathname. */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       NULL);
+		return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */
+	}
+}
+
+void traverse_commit_list_omit_all_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data)
+{
+	struct filter_omit_all_blobs_data d;
+
+	memset(&d, 0, sizeof(d));
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_omit_all_blobs, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
new file mode 100644
index 0000000..b981020
--- /dev/null
+++ b/list-objects-filters.h
@@ -0,0 +1,17 @@
+#ifndef LIST_OBJECTS_FILTERS_H
+#define LIST_OBJECTS_FILTERS_H
+
+#include "oidset2.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs
+ * from the traversal.
+ */
+void traverse_commit_list_omit_all_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data);
+
+#endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (3 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a filter for traverse_commit_list_filtered() to omit
blobs larger than a requested size from the result, but always
include ".git*" special files.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects-filters.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h | 12 +++++++
 2 files changed, 109 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f29d8bc..f04d70e 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -83,3 +83,100 @@ void traverse_commit_list_omit_all_blobs(
 
 	oidset2_clear(&d.omits);
 }
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+struct filter_omit_large_blobs_data {
+	struct oidset2 omits;
+	int64_t max_bytes;
+};
+
+static list_objects_filter_result filter_omit_large_blobs(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_omit_large_blobs_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	unsigned long s;
+	enum object_type t;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		/*
+		 * If previously provisionally omitted (because of size), see if the
+		 * current filename is special and force it to be included.
+		 */
+		if (oidset2_contains(&filter_data->omits, &obj->oid)) {
+			if ((strncmp(filename, ".git", 4) == 0) && filename[4]) {
+				oidset2_remove(&filter_data->omits, &obj->oid);
+				return LOFR_MARK_SEEN | LOFR_SHOW;
+			}
+			return LOFR_ZERO; /* continue provisionally omitting it */
+		}
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		if (object_length < filter_data->max_bytes)
+			return LOFR_MARK_SEEN | LOFR_SHOW;
+
+		/*
+		 * Provisionally omit it.  We've already established that this blob
+		 * is too big and doesn't have a special filename, so we WANT to
+		 * omit it.  However, there may be a special file elsewhere in the
+		 * tree that references this same blob, so we cannot reject it yet.
+		 * Leave the LOFR_ bits unset so that if the blob appears again in
+		 * the traversal, we will be asked again.
+		 *
+		 * No need for a pathname, since we only test for special filenames
+		 * above.
+		 */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       NULL);
+		return LOFR_ZERO;
+	}
+}
+
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit)
+{
+	struct filter_omit_large_blobs_data d;
+
+	memset(&d, 0, sizeof(d));
+	d.max_bytes = large_byte_limit;
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_omit_large_blobs, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index b981020..32b2833 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -14,4 +14,16 @@ void traverse_commit_list_omit_all_blobs(
 	oidset2_foreach_cb print_omitted_object,
 	void *ctx_data);
 
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit);
+
 #endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (4 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a filter for traverse_commit_list_filtered() to omit the
blobs that would not be needed by a sparse checkout using the
given sparse-checkout spec.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

A future enhancement should be able to also omit tree objects
not needed by such a sparse checkout, but that is not currently
supported.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects-filters.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h |  16 +++++
 2 files changed, 195 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f04d70e..cacf645 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -180,3 +180,182 @@ void traverse_commit_list_omit_large_blobs(
 
 	oidset2_clear(&d.omits);
 }
+
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+struct frame {
+	int defval;
+	int child_prov_omit : 1;
+};
+
+struct filter_use_sparse_data {
+	struct oidset2 omits;
+	struct exclude_list el;
+
+	size_t nr, alloc;
+	struct frame *array_frame;
+};
+
+static list_objects_filter_result filter_use_sparse(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_use_sparse_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	int val, dtype;
+	unsigned long s;
+	enum object_type t;
+	struct frame *frame;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		dtype = DT_DIR;
+		val = is_excluded_from_list(pathname, strlen(pathname),
+					    filename, &dtype, &filter_data->el);
+		if (val < 0)
+			val = filter_data->array_frame[filter_data->nr].defval;
+
+		ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1,
+			   filter_data->alloc);
+		filter_data->nr++;
+		filter_data->array_frame[filter_data->nr].defval = val;
+		filter_data->array_frame[filter_data->nr].child_prov_omit = 0;
+
+		/*
+		 * A directory with this tree OID may appear in multiple
+		 * places in the tree. (Think of a directory move, with
+		 * no other changes.)  And with a different pathname, the
+		 * is_excluded...() results for this directory and items
+		 * contained within it may be different.  So we cannot
+		 * mark it SEEN (yet), since that will prevent process_tree()
+		 * from revisiting this tree object with other pathnames.
+		 *
+		 * Only SHOW the tree object the first time we visit this
+		 * tree object.
+		 *
+		 * We always show all tree objects.  A future optimization
+		 * may want to attempt to narrow this.
+		 */
+		if (obj->flags & FILTER_REVISIT)
+			return LOFR_ZERO;
+		obj->flags |= FILTER_REVISIT;
+		return LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		assert(filter_data->nr > 0);
+
+		frame = &filter_data->array_frame[filter_data->nr];
+		filter_data->nr--;
+
+		/*
+		 * Tell our parent directory if any of our children were
+		 * provisionally omitted.
+		 */
+		filter_data->array_frame[filter_data->nr].child_prov_omit |=
+			frame->child_prov_omit;
+
+		/*
+		 * If there are NO provisionally omitted child objects (ALL child
+		 * objects in this folder were INCLUDED), then we can mark the
+		 * folder as SEEN (so we will not have to revisit it again).
+		 */
+		if (!frame->child_prov_omit)
+			return LOFR_MARK_SEEN;
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		frame = &filter_data->array_frame[filter_data->nr];
+
+		/*
+		 * If we previously provisionally omitted this blob because
+		 * its pathname was not in the sparse-checkout AND this
+		 * reference to the blob has the same pathname, we can avoid
+		 * repeating the exclusion logic on this pathname and just
+		 * continue to provisionally omit it.
+		 */
+		if (obj->flags & FILTER_REVISIT) {
+			struct oidset2_entry *entry_prev;
+			entry_prev = oidset2_get(&filter_data->omits, &obj->oid);
+			if (entry_prev && !strcmp(pathname, entry_prev->pathname)) {
+				frame->child_prov_omit = 1;
+				return LOFR_ZERO;
+			}
+		}
+
+		dtype = DT_REG;
+		val = is_excluded_from_list(pathname, strlen(pathname),
+					    filename, &dtype, &filter_data->el);
+		if (val < 0)
+			val = frame->defval;
+		if (val > 0)
+			return LOFR_MARK_SEEN | LOFR_SHOW;
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		/*
+		 * Provisionally omit it.  We've already established that
+		 * this pathname is not in the sparse-checkout specification,
+		 * so we WANT to omit this blob.  However, a pathname elsewhere
+		 * in the tree may also reference this same blob, so we cannot
+		 * reject it yet.  Leave the LOFR_ bits unset so that if the
+		 * blob appears again in the traversal, we will be asked again.
+		 *
+		 * The pathname we associate with this omit is just the first
+		 * one we saw for this blob.  Other instances of this blob may
+		 * have other pathnames and that is fine.  We just use it for
+		 * perf because most of the time, the blob will be in the same
+		 * place as we walk the commits.
+		 */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       pathname);
+		obj->flags |= FILTER_REVISIT;
+		frame->child_prov_omit = 1;
+		return LOFR_ZERO;
+	}
+}
+
+void traverse_commit_list_use_sparse(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	struct object_id *oid)
+{
+	struct filter_use_sparse_data d;
+
+	memset(&d, 0, sizeof(d));
+	if (add_excludes_from_blob_to_list(oid, NULL, 0, &d.el) < 0)
+		die("filter_use_sparse could not load specification");
+	ALLOC_GROW(d.array_frame, d.nr + 1, d.alloc);
+	d.array_frame[d.nr].defval = 0; /* default to include */
+	d.array_frame[d.nr].child_prov_omit = 0;
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_use_sparse, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index 32b2833..52e507b 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -26,4 +26,20 @@ void traverse_commit_list_omit_large_blobs(
 	void *ctx_data,
 	int64_t large_byte_limit);
 
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+void traverse_commit_list_use_sparse(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	struct object_id *oid);
+
 #endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/19] object-filter: common declarations for object filtering
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (5 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Create common routines and defines for parsing
object-filter-related command line arguments and
pack-protocol fields.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile        |   1 +
 object-filter.c | 201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 object-filter.h | 145 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 347 insertions(+)
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h

diff --git a/Makefile b/Makefile
index 48fdcf2..daa9ea2 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += object-filter.o
 LIB_OBJS += oidset.o
 LIB_OBJS += oidset2.o
 LIB_OBJS += pack-bitmap.o
diff --git a/object-filter.c b/object-filter.c
new file mode 100644
index 0000000..5be6129
--- /dev/null
+++ b/object-filter.c
@@ -0,0 +1,201 @@
+#include "cache.h"
+#include "commit.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "oidset2.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
+
+int parse_filter_omit_all_blobs(struct object_filter_options *filter_options)
+{
+	if (object_filter_enabled(filter_options))
+		die(_("multiple object filter types cannot be combined"));
+
+	filter_options->omit_all_blobs = 1;
+	return 0;
+}
+
+int parse_filter_omit_large_blobs(struct object_filter_options *filter_options,
+				  const char *arg)
+{
+	if (object_filter_enabled(filter_options))
+		die(_("multiple object filter types cannot be combined"));
+
+	filter_options->omit_large_blobs = 1;
+
+	/* we allow "<digits>[kmg]" */
+	if (!git_parse_ulong(arg, &filter_options->large_byte_limit))
+		die(_("invalid size limit for large object filter"));
+
+	filter_options->large_byte_limit_string = strdup(arg);
+	return 0;
+}
+
+int parse_filter_use_sparse(struct object_filter_options *filter_options,
+			    const char *arg)
+{
+	struct object_context oc;
+
+	if (object_filter_enabled(filter_options))
+		die(_("multiple object filter types cannot be combined"));
+
+	filter_options->use_sparse = 1;
+
+	/*
+	 * The command line argument needs to resolve to an known OID
+	 * representing the content of the desired sparse-checkout file.
+	 * We allow various syntax forms for the convenience of the user.
+	 * See sha1_name.c:get_sha1_with_context_1().
+	 *
+	 * Try to evaluate the arg locally in case they use one of the
+	 * convenience patterns.  This must resolve to a blob.
+	 */
+	if (get_sha1_with_context(arg, GET_SHA1_BLOB,
+				  filter_options->sparse_oid.hash, &oc)) {
+		/*
+		 * If that fails, keep the original string in case a client
+		 * command wants to send it to the server.  This allows the
+		 * client to name an OID for a blob they don't have.
+		 */
+		filter_options->sparse_value = strdup(arg);
+		oidcpy(&filter_options->sparse_oid, &null_oid);
+	} else {
+		/*
+		 * Round-trip the found OID to normalize it.
+		 */
+		filter_options->sparse_value =
+			strdup(oid_to_hex(&filter_options->sparse_oid));
+	}
+	
+	return 0;
+}
+
+int parse_filter_print_manifest(struct object_filter_options *filter_options)
+{
+	filter_options->print_manifest = 1;
+	return 0;
+}
+
+int parse_filter_relax(struct object_filter_options *filter_options)
+{
+	filter_options->relax = 1;
+	return 0;
+}
+
+int opt_parse_filter_omit_all_blobs(const struct option *opt,
+				    const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(!arg);
+	assert(!unset);
+
+	return parse_filter_omit_all_blobs(filter_options);
+}
+
+int opt_parse_filter_omit_large_blobs(const struct option *opt,
+				      const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(arg);
+	assert(!unset);
+
+	return parse_filter_omit_large_blobs(filter_options, arg);
+}
+
+int opt_parse_filter_use_sparse(const struct option *opt,
+				const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(arg);
+	assert(!unset);
+
+	return parse_filter_use_sparse(filter_options, arg);
+}
+
+int opt_parse_filter_print_manifest(const struct option *opt,
+				    const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(!arg);
+	assert(!unset);
+
+	return parse_filter_print_manifest(filter_options);
+}
+
+int opt_parse_filter_relax(const struct option *opt,
+			   const char *arg, int unset)
+{
+	struct object_filter_options *filter_options = opt->value;
+
+	assert(!arg);
+	assert(!unset);
+
+	return parse_filter_relax(filter_options);
+}
+
+int object_filter_hand_parse_arg(struct object_filter_options *filter_options,
+				 const char *arg,
+				 int allow_print_manifest,
+				 int allow_relax)
+{
+	if (!strcmp(arg, ("--"CL_ARG_FILTER_OMIT_ALL_BLOBS))) {
+		parse_filter_omit_all_blobs(filter_options);
+		return 1;
+	}
+	if (skip_prefix(arg, ("--"CL_ARG_FILTER_OMIT_LARGE_BLOBS"="), &arg)) {
+		parse_filter_omit_large_blobs(filter_options, arg);
+		return 1;
+	}
+	if (skip_prefix(arg, ("--"CL_ARG_FILTER_USE_SPARSE"="), &arg)) {
+		parse_filter_use_sparse(filter_options, arg);
+		return 1;
+	}
+
+	if (allow_print_manifest &&
+	    !strcmp(arg, ("--"CL_ARG_FILTER_PRINT_MANIFEST))) {
+		parse_filter_print_manifest(filter_options);
+		return 1;
+	}
+
+	if (allow_relax && !strcmp(arg, ("--"CL_ARG_FILTER_RELAX))) {
+		parse_filter_relax(filter_options);
+		return 1;
+	}
+
+	return 0;
+}
+
+int object_filter_hand_parse_protocol(struct object_filter_options *filter_options,
+				      const char *arg,
+				      int allow_print_manifest,
+				      int allow_relax)
+{
+	if (!strcmp(arg, CL_ARG_FILTER_OMIT_ALL_BLOBS)) {
+		parse_filter_omit_all_blobs(filter_options);
+		return 1;
+	}
+	if (skip_prefix(arg, (CL_ARG_FILTER_OMIT_LARGE_BLOBS" "), &arg)) {
+		parse_filter_omit_large_blobs(filter_options, arg);
+		return 1;
+	}
+	if (skip_prefix(arg, (CL_ARG_FILTER_USE_SPARSE" "), &arg)) {
+		parse_filter_use_sparse(filter_options, arg);
+		return 1;
+	}
+
+	if (allow_print_manifest &&
+	    !strcmp(arg, CL_ARG_FILTER_PRINT_MANIFEST)) {
+		parse_filter_print_manifest(filter_options);
+		return 1;
+	}
+	if (allow_relax && !strcmp(arg, CL_ARG_FILTER_RELAX)) {
+		parse_filter_relax(filter_options);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/object-filter.h b/object-filter.h
new file mode 100644
index 0000000..f1ca5fb
--- /dev/null
+++ b/object-filter.h
@@ -0,0 +1,145 @@
+#ifndef OBJECT_FILTER_H
+#define OBJECT_FILTER_H
+
+#include "parse-options.h"
+
+/*
+ * Common declarations and utilities for filtering objects (such as omitting
+ * large blobs) during fetch-pack, upload-pack, and the pack-protocol.  These
+ * are intended for partial/narrow clone/fetch.
+ */
+
+struct object_filter_options {
+	/*
+	 * blob-ish path or value that get_sha1_with_context() can turn into
+	 * an OID to find the blob containing the sparse-checkout specification.
+	 * only used when use_sparse is set.
+	 */
+	const char *sparse_value;
+	struct object_id sparse_oid;
+
+	/*
+	 * blob size byte limit for filtering.  only blobs smaller than this
+	 * value will be included.  a value of zero, omits all blobs.
+	 * only used when omit_large_blobs is set.  Integer and string versions
+	 * of this are kept for convenience.
+	 */
+	unsigned long large_byte_limit;
+	const char *large_byte_limit_string;
+
+	/* valid filter types (only one may be used at a time) */
+	unsigned omit_all_blobs : 1;
+	unsigned omit_large_blobs : 1;
+	unsigned use_sparse : 1;
+
+	/* true if the filter should output a manifest of the omitted objects. */
+	unsigned print_manifest : 1;
+
+	/* true to suppress missing object errors during consistency checks */
+	unsigned relax : 1;
+};
+
+/*
+ * Return true if a filter is enabled.
+ */
+inline int object_filter_enabled(const struct object_filter_options *p)
+{
+	return p->omit_all_blobs || p->omit_large_blobs || p->use_sparse;
+}
+
+/* See Documentation/technical/protocol-capabilities.txt */
+#define PROTOCOL_CAPABILITY_FILTER_OBJECTS         "filter-objects"
+
+/* See Documentation/technical/pack-protocol.txt */
+#define PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS     "filter-omit-all-blobs"
+#define PROTOCOL_REQUEST_FILTER_OMIT_LARGE_BLOBS   "filter-omit-large-blobs"
+#define PROTOCOL_REQUEST_FILTER_USE_SPARSE         "filter-use-sparse"
+
+/* Normalized command line arguments */
+#define CL_ARG_FILTER_OMIT_ALL_BLOBS     "filter-omit-all-blobs"
+#define CL_ARG_FILTER_OMIT_LARGE_BLOBS   "filter-omit-large-blobs"
+#define CL_ARG_FILTER_USE_SPARSE         "filter-use-sparse"
+#define CL_ARG_FILTER_PRINT_MANIFEST     "filter-print-manifest"
+#define CL_ARG_FILTER_RELAX              "filter-relax"
+
+/*
+ * Common command line argument parsing for object-filter-related
+ * arguments (whether from a hand-parsed or parse-options style
+ * parser.
+ */
+int parse_filter_omit_all_blobs(struct object_filter_options *filter_options);
+int parse_filter_omit_large_blobs(struct object_filter_options *filter_options,
+				  const char *arg);
+int parse_filter_use_sparse(struct object_filter_options *filter_options,
+			    const char *arg);
+int parse_filter_print_manifest(struct object_filter_options *filter_options);
+int parse_filter_relax(struct object_filter_options *filter_options);
+
+/*
+ * Common command line argument parsers for object-filter-related
+ * arguments comming from parse-options style parsers.
+ */
+
+int opt_parse_filter_omit_all_blobs(const struct option *opt,
+				    const char *arg, int unset);
+int opt_parse_filter_omit_large_blobs(const struct option *opt,
+				      const char *arg, int unset);
+int opt_parse_filter_use_sparse(const struct option *opt,
+				const char *arg, int unset);
+int opt_parse_filter_print_manifest(const struct option *opt,
+				    const char *arg, int unset);
+int opt_parse_filter_relax(const struct option *opt,
+			   const char *arg, int unset);
+
+#define OPT_PARSE_FILTER_OMIT_ALL_BLOBS(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_OMIT_ALL_BLOBS, fo, NULL, \
+	  N_("omit all blobs from result"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, \
+	  opt_parse_filter_omit_all_blobs }
+
+#define OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_OMIT_LARGE_BLOBS, fo, N_("size"), \
+	  N_("omit large blobs from result"), PARSE_OPT_NONEG, \
+	  opt_parse_filter_omit_large_blobs }
+
+#define OPT_PARSE_FILTER_USE_SPARSE(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_USE_SPARSE, fo, N_("object"), \
+	  N_("filter results using sparse-checkout specification"), PARSE_OPT_NONEG, \
+	  opt_parse_filter_use_sparse }
+
+#define OPT_PARSE_FILTER_PRINT_MANIFEST(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_PRINT_MANIFEST, fo, NULL,	\
+	  N_("print manifest of omitted objects"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, \
+	  opt_parse_filter_print_manifest }
+
+#define OPT_PARSE_FILTER_RELAX(fo) \
+	{ OPTION_CALLBACK, 0, CL_ARG_FILTER_RELAX, fo, NULL, \
+	  N_("relax consistency checks for previously omitted objects"), \
+	  PARSE_OPT_NOARG | PARSE_OPT_NONEG, opt_parse_filter_relax }
+
+/*
+ * Hand parse known object-filter command line options.
+ * Use this when the caller DOES NOT use the normal OPT_
+ * routines.
+ *
+ * Here we assume args of the form "--<key>" or "--<key>=<value>".
+ * Note the literal dash-dash and equals.
+ *
+ * Returns 1 if we handled the argument.
+ */
+int object_filter_hand_parse_arg(struct object_filter_options *filter_options,
+				 const char *arg,
+				 int allow_print_manifest,
+				 int allow_relax);
+
+/*
+ * Hand parse known object-filter protocol lines.
+ *
+ * Here we assume args of the form "<key>" or "<key> <value>".
+ * Note the literal space before between the key and value.
+ */ 
+int object_filter_hand_parse_protocol(struct object_filter_options *filter_options,
+				      const char *arg,
+				      int allow_print_manifest,
+				      int allow_relax);
+
+#endif /* OBJECT_FILTER_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/19] rev-list: add object filtering support
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (6 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach rev-list to use the filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.

This feature is only enabled when one of the "--objects*"
options are used.

When the "--filter-print-manifest" option is used, the
omitted objects and their sizes are printed at the end.
These are marked with a "~".  This can be combined with
"--quiet" to get a list of just the omitted objects.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/rev-list.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bcf77f0..fd9a7e5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
 #include "pack.h"
 #include "pack-bitmap.h"
 #include "builtin.h"
@@ -52,6 +54,7 @@ static const char rev_list_usage[] =
 
 static struct progress *progress;
 static unsigned progress_counter;
+static struct object_filter_options filter_options;
 
 static void finish_commit(struct commit *commit, void *data);
 static void show_commit(struct commit *commit, void *data)
@@ -178,8 +181,20 @@ static void finish_commit(struct commit *commit, void *data)
 static void finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+		if (filter_options.relax) {
+			/*
+			 * Relax consistency checks to not complain about
+			 * omitted objects (presumably caused by use of
+			 * the previous use of the 'filter-objects' feature).
+			 *
+			 * Note that this is independent of any filtering that
+			 * we are doing in this run.
+			 */
+			return;
+		}
 		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
 		parse_object(obj->oid.hash);
 }
@@ -199,6 +214,16 @@ static void show_edge(struct commit *commit)
 	printf("-%s\n", oid_to_hex(&commit->object.oid));
 }
 
+static void print_omitted_object(int i, int i_limit, struct oidset2_entry *e, void *cb_data)
+{
+	/* struct rev_list_info *info = cb_data; */
+
+	if (e->object_length == -1)
+		printf("~%s\n", oid_to_hex(&e->oid));
+	else
+		printf("~%s %"PRIuMAX"\n", oid_to_hex(&e->oid), e->object_length);
+}
+
 static void print_var_str(const char *var, const char *val)
 {
 	printf("%s='%s'\n", var, val);
@@ -276,6 +301,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	int bisect_find_all = 0;
 	int use_bitmap_index = 0;
 	const char *show_progress = NULL;
+	oidset2_foreach_cb fn_filter_print = NULL;
 
 	git_config(git_default_config, NULL);
 	init_revisions(&revs, prefix);
@@ -329,6 +355,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
+		if (object_filter_hand_parse_arg(&filter_options, arg, 1, 1)) {
+			if (!revs.blob_objects)
+				die(_("object filtering requires --objects"));
+			if (filter_options.use_sparse &&
+			    !oidcmp(&filter_options.sparse_oid, &null_oid))
+				die(_("invalid sparse value"));
+			continue;
+		}
 		usage(rev_list_usage);
 
 	}
@@ -353,6 +387,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
+	if (object_filter_enabled(&filter_options)) {
+		if (use_bitmap_index)
+			die(_("cannot combine --use-bitmap-index with object filtering"));
+	}
+
 	save_commit_buffer = (revs.verbose_header ||
 			      revs.grep_filter.pattern_list ||
 			      revs.grep_filter.header_list);
@@ -397,7 +436,22 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
-	traverse_commit_list(&revs, show_commit, show_object, &info);
+	if (filter_options.print_manifest)
+		fn_filter_print = print_omitted_object;
+
+	if (filter_options.omit_all_blobs)
+		traverse_commit_list_omit_all_blobs(
+			&revs, show_commit, show_object, fn_filter_print, &info);
+	else if (filter_options.omit_large_blobs)
+		traverse_commit_list_omit_large_blobs(
+			&revs, show_commit, show_object, fn_filter_print, &info,
+			(int64_t)(uint64_t)filter_options.large_byte_limit);
+	else if (filter_options.use_sparse)
+		traverse_commit_list_use_sparse(
+			&revs, show_commit, show_object, fn_filter_print, &info,
+			&filter_options.sparse_oid);
+	else
+		traverse_commit_list(&revs, show_commit, show_object, &info);
 
 	stop_progress(&progress);
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/19] rev-list: add filtering help text
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (7 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git-rev-list.txt     |  7 ++++++-
 Documentation/rev-list-options.txt | 26 ++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index ef22f17..d20c2ab 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -47,7 +47,12 @@ SYNOPSIS
 	     [ --fixed-strings | -F ]
 	     [ --date=<format>]
 	     [ [ --objects | --objects-edge | --objects-edge-aggressive ]
-	       [ --unpacked ] ]
+	       [ --unpacked ]
+	       [ [ --filter-omit-all-blobs |
+		   --filter-omit-large-blobs=<n>[kmg] |
+		   --filter-use-sparse=<object> ]
+		 [ --filter-print-manifest ] ] ]
+	     [ --filter-relax ]
 	     [ --pretty | --header ]
 	     [ --bisect ]
 	     [ --bisect-vars ]
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a02f732..e0112dd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -693,6 +693,32 @@ ifdef::git-rev-list[]
 --unpacked::
 	Only useful with `--objects`; print the object IDs that are not
 	in packs.
+
+--filter-omit-all-blobs::
+	Only useful with one of the `--objects*`; omits all blobs from
+	the printed list of objects.
+
+--filter-omit-large-blobs=<n>[kmg]::
+	Only useful with one of the `--objects*`; omits blobs larger than
+	n bytes from the printed list of objects.  May optionally be
+	followed by 'k', 'm', or 'g' units.  Value may be zero.  Special
+	files (matching ".git*") are always included, regardless of size.
+
+--filter-use-sparse=<object>::
+	Only useful with one of the `--objects*`; uses a sparse-checkout
+	specification contained in the given object to filter the result
+	by omitting blobs that would not be used by the corresponding
+	sparse checkout.
+
+--filter-print-manifest::
+	Only useful with one of the above `--filter*`; prints a manifest
+	of the omitted objects.  Object IDs are prefixed with a ``~''
+	character.  The object size is printed after the ID.
+
+--filter-relax::
+	Relax consistency checking for missing blobs.  Do not warn of
+	missing blobs during normal (non-filtering) object traversal
+	following an earlier partial/narrow clone or fetch.
 endif::git-rev-list[]
 
 --no-walk[=(sorted|unsorted)]::
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/19] t6112: rev-list object filtering test
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (8 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t6112-rev-list-filters-objects.sh | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 t/t6112-rev-list-filters-objects.sh

diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
new file mode 100644
index 0000000..ded2b04
--- /dev/null
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='git rev-list with object filtering'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	for n in 1 2 3 4 5 ; do \
+		echo $n > file.$n ; \
+		git add file.$n ; \
+		git commit -m "$n" ; \
+	done
+'
+
+test_expect_success 'omit-all-blobs omitted 5 blobs' '
+	git rev-list HEAD --objects --filter-print-manifest --filter-omit-all-blobs >omit_all &&
+	grep "^~" omit_all >omitted &&
+	test $(cat omitted | wc -l) = 5
+'
+
+test_expect_success 'omit-all-blobs blob sha match' '
+	git rev-list HEAD --objects >normal &&
+	awk "/file/ {print \$1;}" <normal | sort >normal_sha &&
+	sed "s/~//" <omitted | awk "{print \$1;}" | sort >omit_all_sha &&
+	test_cmp normal_sha omit_all_sha
+'
+
+test_expect_success 'omit-all-blobs nothing else changed' '
+	grep -v "file" <normal | sort >normal_other &&
+	grep -v "~" <omit_all | sort >omit_other &&
+	test_cmp normal_other omit_other
+'
+
+# TODO test filter-omit-large-blobs
+# TODO test filter-use-sparse
+
+test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/19] pack-objects: add object filtering support
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (9 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach pack-objects to use filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.

This feature is intended for narrow/partial clone/fetch.

Filtering requires use of "--stdout" option.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/pack-objects.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 50e01aa..614ad60 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,6 +14,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
 #include "pack-objects.h"
 #include "progress.h"
 #include "refs.h"
@@ -77,6 +79,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static struct object_filter_options filter_options;
+
 /*
  * stats
  */
@@ -2800,7 +2804,20 @@ static void get_object_list(int ac, const char **av)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	mark_edges_uninteresting(&revs, show_edge);
-	traverse_commit_list(&revs, show_commit, show_object, NULL);
+
+	if (filter_options.omit_all_blobs)
+		traverse_commit_list_omit_all_blobs(
+			&revs, show_commit, show_object, NULL, NULL);
+	else if (filter_options.omit_large_blobs)
+		traverse_commit_list_omit_large_blobs(
+			&revs, show_commit, show_object, NULL, NULL,
+			(int64_t)(uint64_t)filter_options.large_byte_limit);
+	else if (filter_options.use_sparse)
+		traverse_commit_list_use_sparse(
+			&revs, show_commit, show_object, NULL, NULL,
+			&filter_options.sparse_oid);
+	else
+		traverse_commit_list(&revs, show_commit, show_object, NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
@@ -2936,6 +2953,14 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("use a bitmap index if available to speed up counting objects")),
 		OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
 			 N_("write a bitmap index together with the pack index")),
+
+		OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+		OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+		OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+		/* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+		/* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
 		OPT_END(),
 	};
 
@@ -3007,6 +3032,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!pack_to_stdout && thin)
 		die("--thin cannot be used to build an indexable pack.");
 
+	if (!pack_to_stdout && object_filter_enabled(&filter_options))
+		die("object filtering cannot be used when building an indexable pack.");
+
 	if (keep_unreachable && unpack_unreachable)
 		die("--keep-unreachable and --unpack-unreachable are incompatible.");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
@@ -3031,6 +3059,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow())
 		use_bitmap_index = 0;
 
+	if (object_filter_enabled(&filter_options))
+		use_bitmap_index = 0;
+
 	if (pack_to_stdout || !rev_list_all)
 		write_bitmap_index = 0;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/19] pack-objects: add filtering help text
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (10 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Update pack-objects help text to describe object filtering.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git-pack-objects.txt | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 8973510..084641f 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -231,6 +231,20 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
 	With this option, parents that are hidden by grafts are packed
 	nevertheless.
 
+--filter-omit-all-blobs::
+	Omits all blobs from the packfile.  This option requires --stdout.
+
+--filter-omit-large-blobs=<n>[kmg]::
+	Omits blobs larger than	n bytes from packfile.  May optionally be
+	followed by 'k', 'm', or 'g' units.  Value may be zero.  Special
+	files (matching ".git*") are always included, regardless of size.
+	This option requires --stdout.
+
+--filter-use-sparse=<object>::
+	Uses a sparse-checkout specification given by <object> to filter
+	the result by omitting blobs that would not be used by the
+	corresponding sparse checkout.  This option requires --stdout.
+
 SEE ALSO
 --------
 linkgit:git-rev-list[1]
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (11 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/pack-protocol.txt         | 16 ++++++++++++++++
 Documentation/technical/protocol-capabilities.txt |  7 +++++++
 2 files changed, 23 insertions(+)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index a349171..dce6e04 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       [filter-request]
 		       flush-pkt
 
   want-list         =  first-want
@@ -226,7 +227,13 @@ out of what the server said it could do with the first 'want' line.
   first-want        =  PKT-LINE("want" SP obj-id SP capability-list)
   additional-want   =  PKT-LINE("want" SP obj-id)
 
+  filter-request    =  PKT-LINE("filter-omit-all-blobs") /
+		       PKT-LINE("filter-omit-large-blobs" SP magnitude) /
+		       PKT-LINE("filter-use-sparse" SP obj-id)
+
   depth             =  1*DIGIT
+
+  magnitude         =  1*DIGIT [ "k" | "m" | "g" ]
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +256,15 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial/narrow clone/fetch operations.
+"filter-omit-all-blobs" requests that all blobs be omitted from
+the packfile.  "filter-omit-large-blobs" requests that blobs larger
+than the requested size be omitted, unless they have a ".git*"
+special filename.  "filter-use-sparse" requests blob filtering based
+upon a sparse-checkout specification in the given blob id.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..7011eb3 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter-objects
+--------------
+
+If the upload-pack server advertises the 'filter-objects' capability,
+fetch-pack may send "filter-*" commands to request a partial/narrow
+clone/fetch where the server omits various objects from the packfile.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/19] upload-pack: add object filtering
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (12 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 upload-pack.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/upload-pack.c b/upload-pack.c
index ffb028d..c709054 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -17,6 +17,7 @@
 #include "parse-options.h"
 #include "argv-array.h"
 #include "prio-queue.h"
+#include "object-filter.h"
 
 static const char * const upload_pack_usage[] = {
 	N_("git upload-pack [<options>] <dir>"),
@@ -63,6 +64,9 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static int capability_filter_objects_requested;
+static struct object_filter_options filter_options;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -131,6 +135,30 @@ static void create_pack_file(void)
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
 
+	if (filter_options.omit_all_blobs)
+		argv_array_push(&pack_objects.args,
+				("--" CL_ARG_FILTER_OMIT_ALL_BLOBS));
+	else if (filter_options.omit_large_blobs) {
+		if (filter_options.large_byte_limit_string)
+			argv_array_pushf(&pack_objects.args, "--%s=%s",
+					 CL_ARG_FILTER_OMIT_LARGE_BLOBS,
+					 filter_options.large_byte_limit_string);
+		else
+			argv_array_pushf(&pack_objects.args, "--%s=%ld",
+					 CL_ARG_FILTER_OMIT_LARGE_BLOBS,
+					 filter_options.large_byte_limit);
+	}
+	else if (filter_options.use_sparse) {
+		if (!oidcmp(&filter_options.sparse_oid, &null_oid))
+			argv_array_pushf(&pack_objects.args, "--%s=%s",
+					 CL_ARG_FILTER_USE_SPARSE,
+					 oid_to_hex(&filter_options.sparse_oid));
+		else
+			argv_array_pushf(&pack_objects.args, "--%s=%s",
+					 CL_ARG_FILTER_USE_SPARSE,
+					 filter_options.sparse_value);
+	}
+
 	pack_objects.in = -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
@@ -793,6 +821,12 @@ static void receive_needs(void)
 			deepen_rev_list = 1;
 			continue;
 		}
+		if (object_filter_hand_parse_protocol(&filter_options, line, 0, 0)) {
+			if (!capability_filter_objects_requested)
+				die("git upload-pack: object filtering requires '%s' capability",
+				    PROTOCOL_CAPABILITY_FILTER_OBJECTS);
+			continue;
+		}
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_sha1_hex(arg, sha1_buf))
 			die("git upload-pack: protocol error, "
@@ -820,6 +854,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, PROTOCOL_CAPABILITY_FILTER_OBJECTS))
+			capability_filter_objects_requested = 1;
 
 		o = parse_object(sha1_buf);
 		if (!o) {
@@ -928,7 +964,8 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow deepen-since deepen-not"
-		" deepen-relative no-progress include-tag multi_ack_detailed";
+		" deepen-relative no-progress include-tag multi_ack_detailed"
+		" " PROTOCOL_CAPABILITY_FILTER_OBJECTS;
 	const char *refname_nons = strip_namespace(refname);
 	struct object_id peeled;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 15/19] fetch-pack: add object filtering support
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (13 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch-pack.c |  3 +++
 fetch-pack.c         | 28 ++++++++++++++++++++++++++++
 fetch-pack.h         |  2 ++
 transport.c          | 27 +++++++++++++++++++++++++++
 transport.h          |  8 ++++++++
 5 files changed, 68 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d1..72f9063 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.update_shallow = 1;
 			continue;
 		}
+		if (object_filter_hand_parse_arg(&args.filter_options, arg, 0, 0)) {
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b05..642077d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -374,6 +374,8 @@ static int find_common(struct fetch_pack_args *args,
 			if (prefer_ofs_delta)   strbuf_addstr(&c, " ofs-delta");
 			if (deepen_since_ok)    strbuf_addstr(&c, " deepen-since");
 			if (deepen_not_ok)      strbuf_addstr(&c, " deepen-not");
+			if (object_filter_enabled(&args->filter_options))
+				strbuf_addstr(&c, (" " PROTOCOL_CAPABILITY_FILTER_OBJECTS));
 			if (agent_supported)    strbuf_addf(&c, " agent=%s",
 							    git_user_agent_sanitized());
 			packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
@@ -404,6 +406,18 @@ static int find_common(struct fetch_pack_args *args,
 			packet_buf_write(&req_buf, "deepen-not %s", s->string);
 		}
 	}
+
+	if (args->filter_options.omit_all_blobs)
+		packet_buf_write(&req_buf, PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS);
+	else if (args->filter_options.omit_large_blobs)
+		packet_buf_write(&req_buf,
+				 PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS " %ld",
+				 args->filter_options.large_byte_limit);
+	else if (args->filter_options.use_sparse)
+		packet_buf_write(&req_buf,
+				 PROTOCOL_REQUEST_FILTER_USE_SPARSE " %s",
+				 args->filter_options.sparse_value);
+
 	packet_buf_flush(&req_buf);
 	state_len = req_buf.len;
 
@@ -811,6 +825,15 @@ static int get_pack(struct fetch_pack_args *args,
 					"--keep=fetch-pack %"PRIuMAX " on %s",
 					(uintmax_t)getpid(), hostname);
 		}
+
+		/*
+		 * Relax consistency check to allow missing blobs (presumably
+		 * because they are exactly the set that we requested be
+		 * omitted.
+		 */
+		if (object_filter_enabled(&args->filter_options))
+			argv_array_push(&cmd.args, ("--" CL_ARG_FILTER_RELAX));
+
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
 	}
@@ -924,6 +947,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	else
 		prefer_ofs_delta = 0;
 
+	if (server_supports(PROTOCOL_CAPABILITY_FILTER_OBJECTS))
+		print_verbose(args, _("Server supports "PROTOCOL_CAPABILITY_FILTER_OBJECTS));
+	else if (object_filter_enabled(&args->filter_options))
+		die(_("Server does not support "PROTOCOL_CAPABILITY_FILTER_OBJECTS));
+
 	if ((agent_feature = server_feature_value("agent", &agent_len))) {
 		agent_supported = 1;
 		if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index b6aeb43..5e6bf3b 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "object-filter.h"
 
 struct oid_array;
 
@@ -12,6 +13,7 @@ struct fetch_pack_args {
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
+	struct object_filter_options filter_options;
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
diff --git a/transport.c b/transport.c
index 4d33138..7abf0b6 100644
--- a/transport.c
+++ b/transport.c
@@ -160,6 +160,32 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FILTER_OMIT_ALL_BLOBS)) {
+		opts->filter_options.omit_all_blobs = !!value;
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FILTER_OMIT_LARGE_BLOBS)) {
+		opts->filter_options.omit_large_blobs = 1;
+		opts->filter_options.large_byte_limit_string = value;
+		if (!value)
+			opts->filter_options.large_byte_limit = 0;
+		else if (!git_parse_ulong(value,
+					  &opts->filter_options.large_byte_limit))
+			die(_("transport: invalid filter value '%s'"), value);
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FILTER_USE_SPARSE)) {
+		opts->filter_options.use_sparse = 1;
+		opts->filter_options.sparse_value = value;
+		/*
+		 * We're constrained by the API for this set_ operation and
+		 * only take a single value.  We don't want to do the get_sha1*()
+		 * lookup (possibly for the second time), because the caller
+		 * should already know and normalized the hex OID string
+		 * (assuming that it used the normal parsing methods).  So we
+		 * assume that the above string value is sufficient here and
+		 * can just NULL the binary OID field.
+		 */
+		oidcpy(&opts->filter_options.sparse_oid, &null_oid);
+		return 0;
 	}
 	return 1;
 }
@@ -228,6 +254,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
+	args.filter_options = data->options.filter_options;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index bc55715..490f827 100644
--- a/transport.h
+++ b/transport.h
@@ -4,6 +4,8 @@
 #include "cache.h"
 #include "run-command.h"
 #include "remote.h"
+#include "fetch-pack.h"
+#include "object-filter.h"
 
 struct string_list;
 
@@ -21,6 +23,7 @@ struct git_transport_options {
 	const char *uploadpack;
 	const char *receivepack;
 	struct push_cas_option *cas;
+	struct object_filter_options filter_options;
 };
 
 enum transport_family {
@@ -210,6 +213,11 @@ void transport_check_allowed(const char *type);
 /* Send push certificates */
 #define TRANS_OPT_PUSH_CERT "pushcert"
 
+/* See Documentation/technical/pack-protocol.txt */
+#define TRANS_OPT_FILTER_OMIT_ALL_BLOBS   "filter-omit-all-blobs"
+#define TRANS_OPT_FILTER_OMIT_LARGE_BLOBS "filter-omit-large-blobs"
+#define TRANS_OPT_FILTER_USE_SPARSE       "filter-use-sparse"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 16/19] connected: add filter_allow_omitted option to API
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (14 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 connected.c | 3 +++
 connected.h | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/connected.c b/connected.c
index 136c2ac..c25b816 100644
--- a/connected.c
+++ b/connected.c
@@ -62,6 +62,9 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 		argv_array_pushf(&rev_list.args, "--progress=%s",
 				 _("Checking connectivity"));
 
+	if (opt->filter_relax)
+		argv_array_push(&rev_list.args, ("--" CL_ARG_FILTER_RELAX));
+
 	rev_list.git_cmd = 1;
 	rev_list.env = opt->env;
 	rev_list.in = -1;
diff --git a/connected.h b/connected.h
index 4ca325f..370710e 100644
--- a/connected.h
+++ b/connected.h
@@ -35,6 +35,12 @@ struct check_connected_options {
 	int progress;
 
 	/*
+	 * Relax consistency checks for missing blobs (presumably
+	 * due to the use 'filter-objects' feature).
+	 */
+	int filter_relax;
+
+	/*
 	 * Insert these variables into the environment of the child process.
 	 */
 	const char **env;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 17/19] clone: add filter arguments
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (15 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/clone.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index a6ae7d6..1408396 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -24,6 +24,7 @@
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
+#include "object-filter.h"
 
 /*
  * Overall FIXMEs:
@@ -57,6 +58,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct object_filter_options filter_options;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -130,6 +132,14 @@ static struct option builtin_clone_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+
+	OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+	/* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+	/* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
 	OPT_END()
 };
 
@@ -643,6 +653,13 @@ static void update_remote_refs(const struct ref *refs,
 	if (check_connectivity) {
 		struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
+		/*
+		 * Relax consistency check to allow missing blobs (presumably
+		 * because they are exactly the set that we requested be
+		 * omitted.
+		 */
+		opt.filter_relax = object_filter_enabled(&filter_options);
+
 		opt.transport = transport;
 		opt.progress = transport->progress;
 
@@ -1059,6 +1076,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			warning(_("--shallow-since is ignored in local clones; use file:// instead."));
 		if (option_not.nr)
 			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
+		if (object_filter_enabled(&filter_options))
+			warning(_("--filter-* options are ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
@@ -1090,6 +1109,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
+	if (filter_options.omit_all_blobs)
+		transport_set_option(transport, TRANS_OPT_FILTER_OMIT_ALL_BLOBS, "1");
+	if (filter_options.omit_large_blobs)
+		transport_set_option(transport, TRANS_OPT_FILTER_OMIT_LARGE_BLOBS,
+				     filter_options.large_byte_limit_string);
+	if (filter_options.use_sparse)
+		transport_set_option(transport, TRANS_OPT_FILTER_USE_SPARSE,
+				     filter_options.sparse_value);
+
 	if (transport->smart_options && !deepen)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (16 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/index-pack.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4ff567d..30ff409 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -11,6 +11,7 @@
 #include "exec_cmd.h"
 #include "streaming.h"
 #include "thread-utils.h"
+#include "object-filter.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -80,6 +81,7 @@ static int verbose;
 static int show_resolving_progress;
 static int show_stat;
 static int check_self_contained_and_connected;
+static int filter_relax;
 
 static struct progress *progress;
 
@@ -220,6 +222,17 @@ static unsigned check_object(struct object *obj)
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
 		int type = sha1_object_info(obj->oid.hash, &size);
+
+		if (type <= 0 && filter_relax) {
+			/*
+			 * Relax consistency checks to not complain about
+			 * omitted objects (presumably caused by use of
+			 * the 'filter-objects' feature).
+			 */
+			obj->flags |= FLAG_CHECKED;
+			return 0;
+		}
+
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
@@ -1721,6 +1734,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
 				max_input_size = strtoumax(arg, NULL, 10);
+			} else if (!strcmp(arg, ("--"CL_ARG_FILTER_RELAX))) {
+				filter_relax = 1;
 			} else
 				usage(index_pack_usage);
 			continue;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 19/19] fetch: add object filtering to fetch
  2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
                   ` (17 preceding siblings ...)
  2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
  18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5f2c2ab..306c165 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -16,6 +16,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "object-filter.h"
 
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -52,6 +53,7 @@ static const char *recurse_submodules_default;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct object_filter_options filter_options;
 
 static int option_parse_recurse_submodules(const struct option *opt,
 				   const char *arg, int unset)
@@ -141,6 +143,14 @@ static struct option builtin_fetch_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+
+	OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+	OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+	/* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+	/* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
 	OPT_END()
 };
 
@@ -733,6 +743,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
+	struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+	/*
+	 * Relax consistency check to allow missing blobs (presumably
+	 * because they are exactly the set that we requested be
+	 * omitted.
+	 */
+	opt.filter_relax = object_filter_enabled(&filter_options);
 
 	fp = fopen(filename, "a");
 	if (!fp)
@@ -744,7 +762,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	rm = ref_map;
-	if (check_connected(iterate_ref_map, &rm, NULL)) {
+	if (check_connected(iterate_ref_map, &rm, &opt)) {
 		rc = error(_("%s did not send all necessary objects\n"), url);
 		goto abort;
 	}
@@ -885,6 +903,13 @@ static int quickfetch(struct ref *ref_map)
 	struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
 	/*
+	 * Relax consistency check to allow missing blobs (presumably
+	 * because they are exactly the set that we requested be
+	 * omitted.
+	 */
+	opt.filter_relax = object_filter_enabled(&filter_options);
+
+	/*
 	 * If we are deepening a shallow clone we already have these
 	 * objects reachable.  Running rev-list here will return with
 	 * a good (0) exit status and we'll bypass the fetch that we
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-07-13 17:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.