* [PATCH v2 00/19] WIP object filtering for partial clone
@ 2017-07-13 17:34 Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
This WIP is a follow up to my earlier patch series to teach
pack-objects to omit large blobs from packfiles. [1]
Like the previous version, this version builds upon a suggestion from
Peff [2] to use the traverse_commit_list() machinery to allow custom
object filtering using a filter callback. This hides the filtering
logic in list-objects.c and list-objects-filters.c and minimizes the
changes to actual commands, such as pack-objects.
This version adds that same filtering capability to rev-list allowing
filtering to be demonstrated without building a packfile. Filtered
blobs are printed with a leading "~" (along with their sizes).
$ ./git rev-list --objects HEAD~1..HEAD
74f806c70507317b8bdbcf3b08459c7c83906bee
818617707aac81ae4620239182b514f65638e37e
d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c
$ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD
74f806c70507317b8bdbcf3b08459c7c83906bee
818617707aac81ae4620239182b514f65638e37e
d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
~306c16551e548ace12c709a332bfea22adcc395f 40732
$ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD
~306c16551e548ace12c709a332bfea22adcc395f 40732
This version contains 3 filters:
1. filter-omit-all-blobs to exclude all blobs (trees and commits only).
2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n>
(but always including ".git*" special files).
3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the
corresponding sparse-checkout.
Sparse-checkout filtering is currently limited to filtering unneeded blobs.
A later enhancement should be able to also filter unneeded tree objects.
This version updates clone, fetch, fetch-pack, and upload-pack commands
to pass the additional object-filter parameters.
As a (possibly) temporary measure, some commands have been updated to
relax missing blob errors during consistency checks. Maintining info
on missing blobs is currently being discussed in [3].
TODO
1. Incorporate with a patch series like [4] to dynamically fetch a
missing blob from the server in read_object on demand.
2. Resolve missing blob consistency check issue.
3. Store filter options from clone in config or .git/info and default
to them in subsequent fetches.
4. fsck, gc, and assorted commands.
5. testing.
[1] https://public-inbox.org/git/20170622203615.34135-1-git@jeffhostetler.com/
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
[4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/
Jeff Hostetler (19):
dir: refactor add_excludes()
oidset2: create oidset subclass with object length and pathname
list-objects: filter objects in traverse_commit_list
list-objects-filters: add omit-all-blobs filter
list-objects-filters: add omit-large-blobs filter
list-objects-filters: add use-sparse-checkout filter
object-filter: common declarations for object filtering
rev-list: add object filtering support
rev-list: add filtering help text
t6112: rev-list object filtering test
pack-objects: add object filtering support
pack-objects: add filtering help text
upload-pack: add filter-objects to protocol documentation
upload-pack: add object filtering
fetch-pack: add object filtering support
connected: add filter_allow_omitted option to API
clone: add filter arguments
index-pack: relax consistency checks for omitted objects
fetch: add object filtering to fetch
Documentation/git-pack-objects.txt | 14 +
Documentation/git-rev-list.txt | 7 +-
Documentation/rev-list-options.txt | 26 ++
Documentation/technical/pack-protocol.txt | 16 +
Documentation/technical/protocol-capabilities.txt | 7 +
Makefile | 3 +
builtin/clone.c | 28 ++
builtin/fetch-pack.c | 3 +
builtin/fetch.c | 27 +-
builtin/index-pack.c | 15 +
builtin/pack-objects.c | 33 +-
builtin/rev-list.c | 58 +++-
connected.c | 3 +
connected.h | 6 +
dir.c | 53 +++-
dir.h | 4 +
fetch-pack.c | 28 ++
fetch-pack.h | 2 +
list-objects-filters.c | 361 ++++++++++++++++++++++
list-objects-filters.h | 45 +++
list-objects.c | 66 +++-
list-objects.h | 30 ++
object-filter.c | 201 ++++++++++++
object-filter.h | 145 +++++++++
oidset2.c | 101 ++++++
oidset2.h | 56 ++++
t/t6112-rev-list-filters-objects.sh | 37 +++
transport.c | 27 ++
transport.h | 8 +
upload-pack.c | 39 ++-
30 files changed, 1425 insertions(+), 24 deletions(-)
create mode 100644 list-objects-filters.c
create mode 100644 list-objects-filters.h
create mode 100644 object-filter.c
create mode 100644 object-filter.h
create mode 100644 oidset2.c
create mode 100644 oidset2.h
create mode 100644 t/t6112-rev-list-filters-objects.sh
--
2.9.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 01/19] dir: refactor add_excludes()
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Refactor add_excludes() to separate the reading of the
exclude file into a buffer and the parsing of the buffer
into exclude_list items.
Add add_excludes_from_blob_to_list() to allow an exclude
file be specified with an OID.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
dir.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
dir.h | 4 ++++
2 files changed, 55 insertions(+), 2 deletions(-)
diff --git a/dir.c b/dir.c
index 31f9343..aeba965 100644
--- a/dir.c
+++ b/dir.c
@@ -725,6 +725,11 @@ static void invalidate_directory(struct untracked_cache *uc,
dir->dirs[i]->recurse = 0;
}
+static int add_excludes_from_buffer(
+ char *buf, size_t size,
+ const char *base, int baselen,
+ struct exclude_list *el);
+
/*
* Given a file with name "fname", read it (either from disk, or from
* the index if "check_index" is non-zero), parse it and store the
@@ -739,9 +744,9 @@ static int add_excludes(const char *fname, const char *base, int baselen,
struct sha1_stat *sha1_stat)
{
struct stat st;
- int fd, i, lineno = 1;
+ int fd;
size_t size = 0;
- char *buf, *entry;
+ char *buf;
fd = open(fname, O_RDONLY);
if (fd < 0 || fstat(fd, &st) < 0) {
@@ -798,6 +803,18 @@ static int add_excludes(const char *fname, const char *base, int baselen,
}
}
+ add_excludes_from_buffer(buf, size, base, baselen, el);
+ return 0;
+}
+
+static int add_excludes_from_buffer(
+ char *buf, size_t size,
+ const char *base, int baselen,
+ struct exclude_list *el)
+{
+ int i, lineno = 1;
+ char *entry;
+
el->filebuf = buf;
if (skip_utf8_bom(&buf, size))
@@ -826,6 +843,38 @@ int add_excludes_from_file_to_list(const char *fname, const char *base,
return add_excludes(fname, base, baselen, el, check_index, NULL);
}
+int add_excludes_from_blob_to_list(
+ struct object_id *oid,
+ const char *base, int baselen,
+ struct exclude_list *el)
+{
+ char *buf;
+ unsigned long size;
+ enum object_type type;
+
+ buf = read_sha1_file(oid->hash, &type, &size);
+ if (!buf)
+ return -1;
+
+ if (type != OBJ_BLOB) {
+ free(buf);
+ return -1;
+ }
+
+ if (size == 0) {
+ free(buf);
+ return 0;
+ }
+
+ if (buf[size - 1] != '\n') {
+ buf = xrealloc(buf, st_add(size, 1));
+ buf[size++] = '\n';
+ }
+
+ add_excludes_from_buffer(buf, size, base, baselen, el);
+ return 0;
+}
+
struct exclude_list *add_exclude_list(struct dir_struct *dir,
int group_type, const char *src)
{
diff --git a/dir.h b/dir.h
index edb5fda..8e754e5 100644
--- a/dir.h
+++ b/dir.h
@@ -242,6 +242,10 @@ extern struct exclude_list *add_exclude_list(struct dir_struct *dir,
extern int add_excludes_from_file_to_list(const char *fname, const char *base, int baselen,
struct exclude_list *el, int check_index);
extern void add_excludes_from_file(struct dir_struct *, const char *fname);
+extern int add_excludes_from_blob_to_list(
+ struct object_id *oid,
+ const char *base, int baselen,
+ struct exclude_list *el);
extern void parse_exclude_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen);
extern void add_exclude(const char *string, const char *base,
int baselen, struct exclude_list *el, int srcpos);
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.
This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
Makefile | 1 +
oidset2.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
oidset2.h | 56 ++++++++++++++++++++++++++++++++++
3 files changed, 158 insertions(+)
create mode 100644 oidset2.c
create mode 100644 oidset2.h
diff --git a/Makefile b/Makefile
index ffa6da7..d590508 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-merge.o
LIB_OBJS += notes-utils.o
LIB_OBJS += object.o
LIB_OBJS += oidset.o
+LIB_OBJS += oidset2.o
LIB_OBJS += pack-bitmap.o
LIB_OBJS += pack-bitmap-write.o
LIB_OBJS += pack-check.o
diff --git a/oidset2.c b/oidset2.c
new file mode 100644
index 0000000..806d153
--- /dev/null
+++ b/oidset2.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "oidset2.h"
+
+static int oidset2_hashcmp(const void *va, const void *vb,
+ const void *vkey)
+{
+ const struct oidset2_entry *a = va, *b = vb;
+ const struct object_id *key = vkey;
+ return oidcmp(&a->oid, key ? key : &b->oid);
+}
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid)
+{
+ struct hashmap_entry key;
+ struct oidset2_entry *value;
+
+ if (!set->map.cmpfn)
+ return NULL;
+
+ hashmap_entry_init(&key, sha1hash(oid->hash));
+ value = hashmap_get(&set->map, &key, oid);
+
+ return value;
+}
+
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid)
+{
+ return !!oidset2_get(set, oid);
+}
+
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+ int64_t object_length, const char *pathname)
+{
+ struct oidset2_entry *entry;
+
+ if (!set->map.cmpfn)
+ hashmap_init(&set->map, oidset2_hashcmp, 0);
+
+ if (oidset2_contains(set, oid))
+ return 1;
+
+ entry = xcalloc(1, sizeof(*entry));
+ hashmap_entry_init(&entry->hash, sha1hash(oid->hash));
+ oidcpy(&entry->oid, oid);
+
+ entry->object_length = object_length;
+ if (pathname)
+ entry->pathname = strdup(pathname);
+
+ hashmap_add(&set->map, entry);
+ return 0;
+}
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid)
+{
+ struct hashmap_entry key;
+ struct oidset2_entry *e;
+
+ hashmap_entry_init(&key, sha1hash(oid->hash));
+ e = hashmap_remove(&set->map, &key, oid);
+
+ free(e->pathname);
+ free(e);
+}
+
+void oidset2_clear(struct oidset2 *set)
+{
+ hashmap_free(&set->map, 1);
+}
+
+static int oidset2_cmp(const void *a, const void *b)
+{
+ const struct oidset2_entry *ae = *((const struct oidset2_entry **)a);
+ const struct oidset2_entry *be = *((const struct oidset2_entry **)b);
+
+ return oidcmp(&ae->oid, &be->oid);
+}
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data)
+{
+ struct hashmap_iter iter;
+ struct oidset2_entry **array;
+ struct oidset2_entry *e;
+ int j, k;
+
+ array = xcalloc(set->map.size, sizeof(*e));
+
+ hashmap_iter_init(&set->map, &iter);
+ k = 0;
+ while ((e = hashmap_iter_next(&iter)))
+ array[k++] = e;
+
+ QSORT(array, k, oidset2_cmp);
+
+ for (j = 0; j < k; j++) {
+ e = array[j];
+ cb(j, k, e, cb_data);
+ }
+
+ free(array);
+}
diff --git a/oidset2.h b/oidset2.h
new file mode 100644
index 0000000..c498eae
--- /dev/null
+++ b/oidset2.h
@@ -0,0 +1,56 @@
+#ifndef OIDSET2_H
+#define OIDSET2_H
+
+/**
+ * oidset2 is a variant of oidset, but allows additional fields for each object.
+ */
+
+/**
+ * A single oidset2; should be zero-initialized (or use OIDSET2_INIT).
+ */
+struct oidset2 {
+ struct hashmap map;
+};
+
+#define OIDSET2_INIT { { NULL } }
+
+struct oidset2_entry {
+ struct hashmap_entry hash;
+ struct object_id oid;
+
+ int64_t object_length; /* This is SIGNED. Use -1 when unknown. */
+ char *pathname;
+};
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Returns true iff `set` contains `oid`.
+ */
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Insert the oid into the set; a copy is made, so "oid" does not need
+ * to persist after this function is called.
+ *
+ * Returns 1 if the oid was already in the set, 0 otherwise. This can be used
+ * to perform an efficient check-and-add.
+ */
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+ int64_t object_length, const char *pathname);
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid);
+
+typedef void (*oidset2_foreach_cb)(
+ int i, int i_limit,
+ struct oidset2_entry *e, void *cb_data);
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data);
+
+/**
+ * Remove all entries from the oidset2, freeing any resources associated with
+ * it.
+ */
+void oidset2_clear(struct oidset2 *set);
+
+#endif /* OIDSET2_H */
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted (not shown)
during a traversal.
Update traverse_commit_list() to be a wrapper for the above.
Filtering will be used in a future commit by rev-list and
pack-objects for narrow/partial clone/fetch to omit certain
blobs from the output.
traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
list-objects.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--------------
list-objects.h | 30 ++++++++++++++++++++++++++
2 files changed, 80 insertions(+), 16 deletions(-)
diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..8dddeda 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
show_object_fn show,
struct strbuf *path,
const char *name,
- void *cb_data)
+ void *cb_data,
+ filter_object_fn filter,
+ void *filter_data)
{
struct object *obj = &blob->object;
size_t pathlen;
+ list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
if (!revs->blob_objects)
return;
@@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs,
die("bad blob object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
- obj->flags |= SEEN;
pathlen = path->len;
strbuf_addstr(path, name);
- show(obj, path->buf, cb_data);
+ if (filter)
+ r = filter(LOFT_BLOB, obj, path->buf, &path->buf[pathlen], filter_data);
+ if (r & LOFR_MARK_SEEN)
+ obj->flags |= SEEN;
+ if (r & LOFR_SHOW)
+ show(obj, path->buf, cb_data);
strbuf_setlen(path, pathlen);
}
@@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs,
show_object_fn show,
struct strbuf *base,
const char *name,
- void *cb_data)
+ void *cb_data,
+ filter_object_fn filter,
+ void *filter_data)
{
struct object *obj = &tree->object;
struct tree_desc desc;
@@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs,
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
all_entries_interesting: entry_not_interesting;
int baselen = base->len;
+ list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
if (!revs->tree_objects)
return;
@@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs,
die("bad tree object %s", oid_to_hex(&obj->oid));
}
- obj->flags |= SEEN;
strbuf_addstr(base, name);
- show(obj, base->buf, cb_data);
+ if (filter)
+ r = filter(LOFT_BEGIN_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+ if (r & LOFR_MARK_SEEN)
+ obj->flags |= SEEN;
+ if (r & LOFR_SHOW)
+ show(obj, base->buf, cb_data);
if (base->len)
strbuf_addch(base, '/');
@@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs,
process_tree(revs,
lookup_tree(entry.oid->hash),
show, base, entry.path,
- cb_data);
+ cb_data, filter, filter_data);
else if (S_ISGITLINK(entry.mode))
process_gitlink(revs, entry.oid->hash,
show, base, entry.path,
@@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs,
process_blob(revs,
lookup_blob(entry.oid->hash),
show, base, entry.path,
- cb_data);
+ cb_data, filter, filter_data);
}
+
+ if (filter) {
+ r = filter(LOFT_END_TREE, obj, base->buf, &base->buf[baselen], filter_data);
+ if (r & LOFR_MARK_SEEN)
+ obj->flags |= SEEN;
+ if (r & LOFR_SHOW)
+ show(obj, base->buf, cb_data);
+ }
+
strbuf_setlen(base, baselen);
free_tree_buffer(tree);
}
@@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
add_pending_object(revs, &tree->object, "");
}
-void traverse_commit_list(struct rev_info *revs,
- show_commit_fn show_commit,
- show_object_fn show_object,
- void *data)
+void traverse_commit_list_filtered(
+ struct rev_info *revs,
+ show_commit_fn show_commit, show_object_fn show_object, void *show_data,
+ filter_object_fn filter, void *filter_data)
{
int i;
struct commit *commit;
@@ -200,7 +223,7 @@ void traverse_commit_list(struct rev_info *revs,
*/
if (commit->tree)
add_pending_tree(revs, commit->tree);
- show_commit(commit, data);
+ show_commit(commit, show_data);
}
for (i = 0; i < revs->pending.nr; i++) {
struct object_array_entry *pending = revs->pending.objects + i;
@@ -211,19 +234,19 @@ void traverse_commit_list(struct rev_info *revs,
continue;
if (obj->type == OBJ_TAG) {
obj->flags |= SEEN;
- show_object(obj, name, data);
+ show_object(obj, name, show_data);
continue;
}
if (!path)
path = "";
if (obj->type == OBJ_TREE) {
process_tree(revs, (struct tree *)obj, show_object,
- &base, path, data);
+ &base, path, show_data, filter, filter_data);
continue;
}
if (obj->type == OBJ_BLOB) {
process_blob(revs, (struct blob *)obj, show_object,
- &base, path, data);
+ &base, path, show_data, filter, filter_data);
continue;
}
die("unknown pending object %s (%s)",
@@ -232,3 +255,14 @@ void traverse_commit_list(struct rev_info *revs,
object_array_clear(&revs->pending);
strbuf_release(&base);
}
+
+void traverse_commit_list(struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ void *show_data)
+{
+ traverse_commit_list_filtered(
+ revs,
+ show_commit, show_object, show_data,
+ NULL, NULL);
+}
diff --git a/list-objects.h b/list-objects.h
index 0cebf85..964e7d3 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -8,4 +8,34 @@ void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, voi
typedef void (*show_edge_fn)(struct commit *);
void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
+enum list_objects_filter_result {
+ LOFR_ZERO = 0,
+ LOFR_MARK_SEEN = 1<<0,
+ LOFR_SHOW = 1<<1,
+};
+
+/* See object.h and revision.h */
+#define FILTER_REVISIT (1<<25)
+
+enum list_objects_filter_type {
+ LOFT_BEGIN_TREE,
+ LOFT_END_TREE,
+ LOFT_BLOB
+};
+
+typedef enum list_objects_filter_result list_objects_filter_result;
+typedef enum list_objects_filter_type list_objects_filter_type;
+
+typedef list_objects_filter_result (*filter_object_fn)(
+ list_objects_filter_type filter_type,
+ struct object *obj,
+ const char *pathname,
+ const char *filename,
+ void *filter_data);
+
+void traverse_commit_list_filtered(
+ struct rev_info *,
+ show_commit_fn, show_object_fn, void *show_data,
+ filter_object_fn filter, void *filter_data);
+
#endif
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (2 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Create a simple filter for traverse_commit_list_filtered() to
omit all blobs from the result.
This filter will be used in a future commit by rev-list and
pack-objects to create a "commits and trees" result. This
is intended for a narrow/partial clone/fetch.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
Makefile | 1 +
list-objects-filters.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++
list-objects-filters.h | 17 ++++++++++
3 files changed, 103 insertions(+)
create mode 100644 list-objects-filters.c
create mode 100644 list-objects-filters.h
diff --git a/Makefile b/Makefile
index d590508..48fdcf2 100644
--- a/Makefile
+++ b/Makefile
@@ -773,6 +773,7 @@ LIB_OBJS += levenshtein.o
LIB_OBJS += line-log.o
LIB_OBJS += line-range.o
LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filters.o
LIB_OBJS += ll-merge.o
LIB_OBJS += lockfile.o
LIB_OBJS += log-tree.o
diff --git a/list-objects-filters.c b/list-objects-filters.c
new file mode 100644
index 0000000..f29d8bc
--- /dev/null
+++ b/list-objects-filters.c
@@ -0,0 +1,85 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filters.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs from the traversal.
+ */
+struct filter_omit_all_blobs_data {
+ struct oidset2 omits;
+};
+
+static list_objects_filter_result filter_omit_all_blobs(
+ list_objects_filter_type filter_type,
+ struct object *obj,
+ const char *pathname,
+ const char *filename,
+ void *filter_data_)
+{
+ struct filter_omit_all_blobs_data *filter_data = filter_data_;
+ int64_t object_length = -1;
+ unsigned long s;
+ enum object_type t;
+
+ switch (filter_type) {
+ default:
+ die("unkown filter_type");
+ return LOFR_ZERO;
+
+ case LOFT_BEGIN_TREE:
+ assert(obj->type == OBJ_TREE);
+ /* always include all tree objects */
+ return LOFR_MARK_SEEN | LOFR_SHOW;
+
+ case LOFT_END_TREE:
+ assert(obj->type == OBJ_TREE);
+ return LOFR_ZERO;
+
+ case LOFT_BLOB:
+ assert(obj->type == OBJ_BLOB);
+ assert((obj->flags & SEEN) == 0);
+
+ /*
+ * Since we always omit all blobs (and never provisionally omit),
+ * we should never see a blob twice.
+ */
+ assert(!oidset2_contains(&filter_data->omits, &obj->oid));
+
+ t = sha1_object_info(obj->oid.hash, &s);
+ assert(t == OBJ_BLOB);
+ object_length = (int64_t)((uint64_t)(s));
+
+ /* Insert OID into the omitted list. No need for a pathname. */
+ oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+ NULL);
+ return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */
+ }
+}
+
+void traverse_commit_list_omit_all_blobs(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data)
+{
+ struct filter_omit_all_blobs_data d;
+
+ memset(&d, 0, sizeof(d));
+
+ traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+ filter_omit_all_blobs, &d);
+
+ if (print_omitted_object)
+ oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+ oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
new file mode 100644
index 0000000..b981020
--- /dev/null
+++ b/list-objects-filters.h
@@ -0,0 +1,17 @@
+#ifndef LIST_OBJECTS_FILTERS_H
+#define LIST_OBJECTS_FILTERS_H
+
+#include "oidset2.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs
+ * from the traversal.
+ */
+void traverse_commit_list_omit_all_blobs(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data);
+
+#endif /* LIST_OBJECTS_FILTERS_H */
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (3 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Create a filter for traverse_commit_list_filtered() to omit
blobs larger than a requested size from the result, but always
include ".git*" special files.
This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
list-objects-filters.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++
list-objects-filters.h | 12 +++++++
2 files changed, 109 insertions(+)
diff --git a/list-objects-filters.c b/list-objects-filters.c
index f29d8bc..f04d70e 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -83,3 +83,100 @@ void traverse_commit_list_omit_all_blobs(
oidset2_clear(&d.omits);
}
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+struct filter_omit_large_blobs_data {
+ struct oidset2 omits;
+ int64_t max_bytes;
+};
+
+static list_objects_filter_result filter_omit_large_blobs(
+ list_objects_filter_type filter_type,
+ struct object *obj,
+ const char *pathname,
+ const char *filename,
+ void *filter_data_)
+{
+ struct filter_omit_large_blobs_data *filter_data = filter_data_;
+ int64_t object_length = -1;
+ unsigned long s;
+ enum object_type t;
+
+ switch (filter_type) {
+ default:
+ die("unkown filter_type");
+ return LOFR_ZERO;
+
+ case LOFT_BEGIN_TREE:
+ assert(obj->type == OBJ_TREE);
+ /* always include all tree objects */
+ return LOFR_MARK_SEEN | LOFR_SHOW;
+
+ case LOFT_END_TREE:
+ assert(obj->type == OBJ_TREE);
+ return LOFR_ZERO;
+
+ case LOFT_BLOB:
+ assert(obj->type == OBJ_BLOB);
+ assert((obj->flags & SEEN) == 0);
+
+ /*
+ * If previously provisionally omitted (because of size), see if the
+ * current filename is special and force it to be included.
+ */
+ if (oidset2_contains(&filter_data->omits, &obj->oid)) {
+ if ((strncmp(filename, ".git", 4) == 0) && filename[4]) {
+ oidset2_remove(&filter_data->omits, &obj->oid);
+ return LOFR_MARK_SEEN | LOFR_SHOW;
+ }
+ return LOFR_ZERO; /* continue provisionally omitting it */
+ }
+
+ t = sha1_object_info(obj->oid.hash, &s);
+ assert(t == OBJ_BLOB);
+ object_length = (int64_t)((uint64_t)(s));
+
+ if (object_length < filter_data->max_bytes)
+ return LOFR_MARK_SEEN | LOFR_SHOW;
+
+ /*
+ * Provisionally omit it. We've already established that this blob
+ * is too big and doesn't have a special filename, so we WANT to
+ * omit it. However, there may be a special file elsewhere in the
+ * tree that references this same blob, so we cannot reject it yet.
+ * Leave the LOFR_ bits unset so that if the blob appears again in
+ * the traversal, we will be asked again.
+ *
+ * No need for a pathname, since we only test for special filenames
+ * above.
+ */
+ oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+ NULL);
+ return LOFR_ZERO;
+ }
+}
+
+void traverse_commit_list_omit_large_blobs(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data,
+ int64_t large_byte_limit)
+{
+ struct filter_omit_large_blobs_data d;
+
+ memset(&d, 0, sizeof(d));
+ d.max_bytes = large_byte_limit;
+
+ traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+ filter_omit_large_blobs, &d);
+
+ if (print_omitted_object)
+ oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+ oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index b981020..32b2833 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -14,4 +14,16 @@ void traverse_commit_list_omit_all_blobs(
oidset2_foreach_cb print_omitted_object,
void *ctx_data);
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+void traverse_commit_list_omit_large_blobs(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data,
+ int64_t large_byte_limit);
+
#endif /* LIST_OBJECTS_FILTERS_H */
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (4 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Create a filter for traverse_commit_list_filtered() to omit the
blobs that would not be needed by a sparse checkout using the
given sparse-checkout spec.
This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.
A future enhancement should be able to also omit tree objects
not needed by such a sparse checkout, but that is not currently
supported.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
list-objects-filters.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++
list-objects-filters.h | 16 +++++
2 files changed, 195 insertions(+)
diff --git a/list-objects-filters.c b/list-objects-filters.c
index f04d70e..cacf645 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -180,3 +180,182 @@ void traverse_commit_list_omit_large_blobs(
oidset2_clear(&d.omits);
}
+
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+struct frame {
+ int defval;
+ int child_prov_omit : 1;
+};
+
+struct filter_use_sparse_data {
+ struct oidset2 omits;
+ struct exclude_list el;
+
+ size_t nr, alloc;
+ struct frame *array_frame;
+};
+
+static list_objects_filter_result filter_use_sparse(
+ list_objects_filter_type filter_type,
+ struct object *obj,
+ const char *pathname,
+ const char *filename,
+ void *filter_data_)
+{
+ struct filter_use_sparse_data *filter_data = filter_data_;
+ int64_t object_length = -1;
+ int val, dtype;
+ unsigned long s;
+ enum object_type t;
+ struct frame *frame;
+
+ switch (filter_type) {
+ default:
+ die("unkown filter_type");
+ return LOFR_ZERO;
+
+ case LOFT_BEGIN_TREE:
+ assert(obj->type == OBJ_TREE);
+ dtype = DT_DIR;
+ val = is_excluded_from_list(pathname, strlen(pathname),
+ filename, &dtype, &filter_data->el);
+ if (val < 0)
+ val = filter_data->array_frame[filter_data->nr].defval;
+
+ ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1,
+ filter_data->alloc);
+ filter_data->nr++;
+ filter_data->array_frame[filter_data->nr].defval = val;
+ filter_data->array_frame[filter_data->nr].child_prov_omit = 0;
+
+ /*
+ * A directory with this tree OID may appear in multiple
+ * places in the tree. (Think of a directory move, with
+ * no other changes.) And with a different pathname, the
+ * is_excluded...() results for this directory and items
+ * contained within it may be different. So we cannot
+ * mark it SEEN (yet), since that will prevent process_tree()
+ * from revisiting this tree object with other pathnames.
+ *
+ * Only SHOW the tree object the first time we visit this
+ * tree object.
+ *
+ * We always show all tree objects. A future optimization
+ * may want to attempt to narrow this.
+ */
+ if (obj->flags & FILTER_REVISIT)
+ return LOFR_ZERO;
+ obj->flags |= FILTER_REVISIT;
+ return LOFR_SHOW;
+
+ case LOFT_END_TREE:
+ assert(obj->type == OBJ_TREE);
+ assert(filter_data->nr > 0);
+
+ frame = &filter_data->array_frame[filter_data->nr];
+ filter_data->nr--;
+
+ /*
+ * Tell our parent directory if any of our children were
+ * provisionally omitted.
+ */
+ filter_data->array_frame[filter_data->nr].child_prov_omit |=
+ frame->child_prov_omit;
+
+ /*
+ * If there are NO provisionally omitted child objects (ALL child
+ * objects in this folder were INCLUDED), then we can mark the
+ * folder as SEEN (so we will not have to revisit it again).
+ */
+ if (!frame->child_prov_omit)
+ return LOFR_MARK_SEEN;
+ return LOFR_ZERO;
+
+ case LOFT_BLOB:
+ assert(obj->type == OBJ_BLOB);
+ assert((obj->flags & SEEN) == 0);
+
+ frame = &filter_data->array_frame[filter_data->nr];
+
+ /*
+ * If we previously provisionally omitted this blob because
+ * its pathname was not in the sparse-checkout AND this
+ * reference to the blob has the same pathname, we can avoid
+ * repeating the exclusion logic on this pathname and just
+ * continue to provisionally omit it.
+ */
+ if (obj->flags & FILTER_REVISIT) {
+ struct oidset2_entry *entry_prev;
+ entry_prev = oidset2_get(&filter_data->omits, &obj->oid);
+ if (entry_prev && !strcmp(pathname, entry_prev->pathname)) {
+ frame->child_prov_omit = 1;
+ return LOFR_ZERO;
+ }
+ }
+
+ dtype = DT_REG;
+ val = is_excluded_from_list(pathname, strlen(pathname),
+ filename, &dtype, &filter_data->el);
+ if (val < 0)
+ val = frame->defval;
+ if (val > 0)
+ return LOFR_MARK_SEEN | LOFR_SHOW;
+
+ t = sha1_object_info(obj->oid.hash, &s);
+ assert(t == OBJ_BLOB);
+ object_length = (int64_t)((uint64_t)(s));
+
+ /*
+ * Provisionally omit it. We've already established that
+ * this pathname is not in the sparse-checkout specification,
+ * so we WANT to omit this blob. However, a pathname elsewhere
+ * in the tree may also reference this same blob, so we cannot
+ * reject it yet. Leave the LOFR_ bits unset so that if the
+ * blob appears again in the traversal, we will be asked again.
+ *
+ * The pathname we associate with this omit is just the first
+ * one we saw for this blob. Other instances of this blob may
+ * have other pathnames and that is fine. We just use it for
+ * perf because most of the time, the blob will be in the same
+ * place as we walk the commits.
+ */
+ oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+ pathname);
+ obj->flags |= FILTER_REVISIT;
+ frame->child_prov_omit = 1;
+ return LOFR_ZERO;
+ }
+}
+
+void traverse_commit_list_use_sparse(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data,
+ struct object_id *oid)
+{
+ struct filter_use_sparse_data d;
+
+ memset(&d, 0, sizeof(d));
+ if (add_excludes_from_blob_to_list(oid, NULL, 0, &d.el) < 0)
+ die("filter_use_sparse could not load specification");
+ ALLOC_GROW(d.array_frame, d.nr + 1, d.alloc);
+ d.array_frame[d.nr].defval = 0; /* default to include */
+ d.array_frame[d.nr].child_prov_omit = 0;
+
+ traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+ filter_use_sparse, &d);
+
+ if (print_omitted_object)
+ oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+ oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index 32b2833..52e507b 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -26,4 +26,20 @@ void traverse_commit_list_omit_large_blobs(
void *ctx_data,
int64_t large_byte_limit);
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+void traverse_commit_list_use_sparse(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data,
+ struct object_id *oid);
+
#endif /* LIST_OBJECTS_FILTERS_H */
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 07/19] object-filter: common declarations for object filtering
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (5 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Create common routines and defines for parsing
object-filter-related command line arguments and
pack-protocol fields.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
Makefile | 1 +
object-filter.c | 201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
object-filter.h | 145 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 347 insertions(+)
create mode 100644 object-filter.c
create mode 100644 object-filter.h
diff --git a/Makefile b/Makefile
index 48fdcf2..daa9ea2 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-cache.o
LIB_OBJS += notes-merge.o
LIB_OBJS += notes-utils.o
LIB_OBJS += object.o
+LIB_OBJS += object-filter.o
LIB_OBJS += oidset.o
LIB_OBJS += oidset2.o
LIB_OBJS += pack-bitmap.o
diff --git a/object-filter.c b/object-filter.c
new file mode 100644
index 0000000..5be6129
--- /dev/null
+++ b/object-filter.c
@@ -0,0 +1,201 @@
+#include "cache.h"
+#include "commit.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "oidset2.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
+
+int parse_filter_omit_all_blobs(struct object_filter_options *filter_options)
+{
+ if (object_filter_enabled(filter_options))
+ die(_("multiple object filter types cannot be combined"));
+
+ filter_options->omit_all_blobs = 1;
+ return 0;
+}
+
+int parse_filter_omit_large_blobs(struct object_filter_options *filter_options,
+ const char *arg)
+{
+ if (object_filter_enabled(filter_options))
+ die(_("multiple object filter types cannot be combined"));
+
+ filter_options->omit_large_blobs = 1;
+
+ /* we allow "<digits>[kmg]" */
+ if (!git_parse_ulong(arg, &filter_options->large_byte_limit))
+ die(_("invalid size limit for large object filter"));
+
+ filter_options->large_byte_limit_string = strdup(arg);
+ return 0;
+}
+
+int parse_filter_use_sparse(struct object_filter_options *filter_options,
+ const char *arg)
+{
+ struct object_context oc;
+
+ if (object_filter_enabled(filter_options))
+ die(_("multiple object filter types cannot be combined"));
+
+ filter_options->use_sparse = 1;
+
+ /*
+ * The command line argument needs to resolve to an known OID
+ * representing the content of the desired sparse-checkout file.
+ * We allow various syntax forms for the convenience of the user.
+ * See sha1_name.c:get_sha1_with_context_1().
+ *
+ * Try to evaluate the arg locally in case they use one of the
+ * convenience patterns. This must resolve to a blob.
+ */
+ if (get_sha1_with_context(arg, GET_SHA1_BLOB,
+ filter_options->sparse_oid.hash, &oc)) {
+ /*
+ * If that fails, keep the original string in case a client
+ * command wants to send it to the server. This allows the
+ * client to name an OID for a blob they don't have.
+ */
+ filter_options->sparse_value = strdup(arg);
+ oidcpy(&filter_options->sparse_oid, &null_oid);
+ } else {
+ /*
+ * Round-trip the found OID to normalize it.
+ */
+ filter_options->sparse_value =
+ strdup(oid_to_hex(&filter_options->sparse_oid));
+ }
+
+ return 0;
+}
+
+int parse_filter_print_manifest(struct object_filter_options *filter_options)
+{
+ filter_options->print_manifest = 1;
+ return 0;
+}
+
+int parse_filter_relax(struct object_filter_options *filter_options)
+{
+ filter_options->relax = 1;
+ return 0;
+}
+
+int opt_parse_filter_omit_all_blobs(const struct option *opt,
+ const char *arg, int unset)
+{
+ struct object_filter_options *filter_options = opt->value;
+
+ assert(!arg);
+ assert(!unset);
+
+ return parse_filter_omit_all_blobs(filter_options);
+}
+
+int opt_parse_filter_omit_large_blobs(const struct option *opt,
+ const char *arg, int unset)
+{
+ struct object_filter_options *filter_options = opt->value;
+
+ assert(arg);
+ assert(!unset);
+
+ return parse_filter_omit_large_blobs(filter_options, arg);
+}
+
+int opt_parse_filter_use_sparse(const struct option *opt,
+ const char *arg, int unset)
+{
+ struct object_filter_options *filter_options = opt->value;
+
+ assert(arg);
+ assert(!unset);
+
+ return parse_filter_use_sparse(filter_options, arg);
+}
+
+int opt_parse_filter_print_manifest(const struct option *opt,
+ const char *arg, int unset)
+{
+ struct object_filter_options *filter_options = opt->value;
+
+ assert(!arg);
+ assert(!unset);
+
+ return parse_filter_print_manifest(filter_options);
+}
+
+int opt_parse_filter_relax(const struct option *opt,
+ const char *arg, int unset)
+{
+ struct object_filter_options *filter_options = opt->value;
+
+ assert(!arg);
+ assert(!unset);
+
+ return parse_filter_relax(filter_options);
+}
+
+int object_filter_hand_parse_arg(struct object_filter_options *filter_options,
+ const char *arg,
+ int allow_print_manifest,
+ int allow_relax)
+{
+ if (!strcmp(arg, ("--"CL_ARG_FILTER_OMIT_ALL_BLOBS))) {
+ parse_filter_omit_all_blobs(filter_options);
+ return 1;
+ }
+ if (skip_prefix(arg, ("--"CL_ARG_FILTER_OMIT_LARGE_BLOBS"="), &arg)) {
+ parse_filter_omit_large_blobs(filter_options, arg);
+ return 1;
+ }
+ if (skip_prefix(arg, ("--"CL_ARG_FILTER_USE_SPARSE"="), &arg)) {
+ parse_filter_use_sparse(filter_options, arg);
+ return 1;
+ }
+
+ if (allow_print_manifest &&
+ !strcmp(arg, ("--"CL_ARG_FILTER_PRINT_MANIFEST))) {
+ parse_filter_print_manifest(filter_options);
+ return 1;
+ }
+
+ if (allow_relax && !strcmp(arg, ("--"CL_ARG_FILTER_RELAX))) {
+ parse_filter_relax(filter_options);
+ return 1;
+ }
+
+ return 0;
+}
+
+int object_filter_hand_parse_protocol(struct object_filter_options *filter_options,
+ const char *arg,
+ int allow_print_manifest,
+ int allow_relax)
+{
+ if (!strcmp(arg, CL_ARG_FILTER_OMIT_ALL_BLOBS)) {
+ parse_filter_omit_all_blobs(filter_options);
+ return 1;
+ }
+ if (skip_prefix(arg, (CL_ARG_FILTER_OMIT_LARGE_BLOBS" "), &arg)) {
+ parse_filter_omit_large_blobs(filter_options, arg);
+ return 1;
+ }
+ if (skip_prefix(arg, (CL_ARG_FILTER_USE_SPARSE" "), &arg)) {
+ parse_filter_use_sparse(filter_options, arg);
+ return 1;
+ }
+
+ if (allow_print_manifest &&
+ !strcmp(arg, CL_ARG_FILTER_PRINT_MANIFEST)) {
+ parse_filter_print_manifest(filter_options);
+ return 1;
+ }
+ if (allow_relax && !strcmp(arg, CL_ARG_FILTER_RELAX)) {
+ parse_filter_relax(filter_options);
+ return 1;
+ }
+
+ return 0;
+}
diff --git a/object-filter.h b/object-filter.h
new file mode 100644
index 0000000..f1ca5fb
--- /dev/null
+++ b/object-filter.h
@@ -0,0 +1,145 @@
+#ifndef OBJECT_FILTER_H
+#define OBJECT_FILTER_H
+
+#include "parse-options.h"
+
+/*
+ * Common declarations and utilities for filtering objects (such as omitting
+ * large blobs) during fetch-pack, upload-pack, and the pack-protocol. These
+ * are intended for partial/narrow clone/fetch.
+ */
+
+struct object_filter_options {
+ /*
+ * blob-ish path or value that get_sha1_with_context() can turn into
+ * an OID to find the blob containing the sparse-checkout specification.
+ * only used when use_sparse is set.
+ */
+ const char *sparse_value;
+ struct object_id sparse_oid;
+
+ /*
+ * blob size byte limit for filtering. only blobs smaller than this
+ * value will be included. a value of zero, omits all blobs.
+ * only used when omit_large_blobs is set. Integer and string versions
+ * of this are kept for convenience.
+ */
+ unsigned long large_byte_limit;
+ const char *large_byte_limit_string;
+
+ /* valid filter types (only one may be used at a time) */
+ unsigned omit_all_blobs : 1;
+ unsigned omit_large_blobs : 1;
+ unsigned use_sparse : 1;
+
+ /* true if the filter should output a manifest of the omitted objects. */
+ unsigned print_manifest : 1;
+
+ /* true to suppress missing object errors during consistency checks */
+ unsigned relax : 1;
+};
+
+/*
+ * Return true if a filter is enabled.
+ */
+inline int object_filter_enabled(const struct object_filter_options *p)
+{
+ return p->omit_all_blobs || p->omit_large_blobs || p->use_sparse;
+}
+
+/* See Documentation/technical/protocol-capabilities.txt */
+#define PROTOCOL_CAPABILITY_FILTER_OBJECTS "filter-objects"
+
+/* See Documentation/technical/pack-protocol.txt */
+#define PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS "filter-omit-all-blobs"
+#define PROTOCOL_REQUEST_FILTER_OMIT_LARGE_BLOBS "filter-omit-large-blobs"
+#define PROTOCOL_REQUEST_FILTER_USE_SPARSE "filter-use-sparse"
+
+/* Normalized command line arguments */
+#define CL_ARG_FILTER_OMIT_ALL_BLOBS "filter-omit-all-blobs"
+#define CL_ARG_FILTER_OMIT_LARGE_BLOBS "filter-omit-large-blobs"
+#define CL_ARG_FILTER_USE_SPARSE "filter-use-sparse"
+#define CL_ARG_FILTER_PRINT_MANIFEST "filter-print-manifest"
+#define CL_ARG_FILTER_RELAX "filter-relax"
+
+/*
+ * Common command line argument parsing for object-filter-related
+ * arguments (whether from a hand-parsed or parse-options style
+ * parser.
+ */
+int parse_filter_omit_all_blobs(struct object_filter_options *filter_options);
+int parse_filter_omit_large_blobs(struct object_filter_options *filter_options,
+ const char *arg);
+int parse_filter_use_sparse(struct object_filter_options *filter_options,
+ const char *arg);
+int parse_filter_print_manifest(struct object_filter_options *filter_options);
+int parse_filter_relax(struct object_filter_options *filter_options);
+
+/*
+ * Common command line argument parsers for object-filter-related
+ * arguments comming from parse-options style parsers.
+ */
+
+int opt_parse_filter_omit_all_blobs(const struct option *opt,
+ const char *arg, int unset);
+int opt_parse_filter_omit_large_blobs(const struct option *opt,
+ const char *arg, int unset);
+int opt_parse_filter_use_sparse(const struct option *opt,
+ const char *arg, int unset);
+int opt_parse_filter_print_manifest(const struct option *opt,
+ const char *arg, int unset);
+int opt_parse_filter_relax(const struct option *opt,
+ const char *arg, int unset);
+
+#define OPT_PARSE_FILTER_OMIT_ALL_BLOBS(fo) \
+ { OPTION_CALLBACK, 0, CL_ARG_FILTER_OMIT_ALL_BLOBS, fo, NULL, \
+ N_("omit all blobs from result"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, \
+ opt_parse_filter_omit_all_blobs }
+
+#define OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(fo) \
+ { OPTION_CALLBACK, 0, CL_ARG_FILTER_OMIT_LARGE_BLOBS, fo, N_("size"), \
+ N_("omit large blobs from result"), PARSE_OPT_NONEG, \
+ opt_parse_filter_omit_large_blobs }
+
+#define OPT_PARSE_FILTER_USE_SPARSE(fo) \
+ { OPTION_CALLBACK, 0, CL_ARG_FILTER_USE_SPARSE, fo, N_("object"), \
+ N_("filter results using sparse-checkout specification"), PARSE_OPT_NONEG, \
+ opt_parse_filter_use_sparse }
+
+#define OPT_PARSE_FILTER_PRINT_MANIFEST(fo) \
+ { OPTION_CALLBACK, 0, CL_ARG_FILTER_PRINT_MANIFEST, fo, NULL, \
+ N_("print manifest of omitted objects"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, \
+ opt_parse_filter_print_manifest }
+
+#define OPT_PARSE_FILTER_RELAX(fo) \
+ { OPTION_CALLBACK, 0, CL_ARG_FILTER_RELAX, fo, NULL, \
+ N_("relax consistency checks for previously omitted objects"), \
+ PARSE_OPT_NOARG | PARSE_OPT_NONEG, opt_parse_filter_relax }
+
+/*
+ * Hand parse known object-filter command line options.
+ * Use this when the caller DOES NOT use the normal OPT_
+ * routines.
+ *
+ * Here we assume args of the form "--<key>" or "--<key>=<value>".
+ * Note the literal dash-dash and equals.
+ *
+ * Returns 1 if we handled the argument.
+ */
+int object_filter_hand_parse_arg(struct object_filter_options *filter_options,
+ const char *arg,
+ int allow_print_manifest,
+ int allow_relax);
+
+/*
+ * Hand parse known object-filter protocol lines.
+ *
+ * Here we assume args of the form "<key>" or "<key> <value>".
+ * Note the literal space before between the key and value.
+ */
+int object_filter_hand_parse_protocol(struct object_filter_options *filter_options,
+ const char *arg,
+ int allow_print_manifest,
+ int allow_relax);
+
+#endif /* OBJECT_FILTER_H */
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 08/19] rev-list: add object filtering support
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (6 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach rev-list to use the filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.
This feature is only enabled when one of the "--objects*"
options are used.
When the "--filter-print-manifest" option is used, the
omitted objects and their sizes are printed at the end.
These are marked with a "~". This can be combined with
"--quiet" to get a list of just the omitted objects.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/rev-list.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bcf77f0..fd9a7e5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
#include "pack.h"
#include "pack-bitmap.h"
#include "builtin.h"
@@ -52,6 +54,7 @@ static const char rev_list_usage[] =
static struct progress *progress;
static unsigned progress_counter;
+static struct object_filter_options filter_options;
static void finish_commit(struct commit *commit, void *data);
static void show_commit(struct commit *commit, void *data)
@@ -178,8 +181,20 @@ static void finish_commit(struct commit *commit, void *data)
static void finish_object(struct object *obj, const char *name, void *cb_data)
{
struct rev_list_info *info = cb_data;
- if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+ if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+ if (filter_options.relax) {
+ /*
+ * Relax consistency checks to not complain about
+ * omitted objects (presumably caused by use of
+ * the previous use of the 'filter-objects' feature).
+ *
+ * Note that this is independent of any filtering that
+ * we are doing in this run.
+ */
+ return;
+ }
die("missing blob object '%s'", oid_to_hex(&obj->oid));
+ }
if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
parse_object(obj->oid.hash);
}
@@ -199,6 +214,16 @@ static void show_edge(struct commit *commit)
printf("-%s\n", oid_to_hex(&commit->object.oid));
}
+static void print_omitted_object(int i, int i_limit, struct oidset2_entry *e, void *cb_data)
+{
+ /* struct rev_list_info *info = cb_data; */
+
+ if (e->object_length == -1)
+ printf("~%s\n", oid_to_hex(&e->oid));
+ else
+ printf("~%s %"PRIuMAX"\n", oid_to_hex(&e->oid), e->object_length);
+}
+
static void print_var_str(const char *var, const char *val)
{
printf("%s='%s'\n", var, val);
@@ -276,6 +301,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
int bisect_find_all = 0;
int use_bitmap_index = 0;
const char *show_progress = NULL;
+ oidset2_foreach_cb fn_filter_print = NULL;
git_config(git_default_config, NULL);
init_revisions(&revs, prefix);
@@ -329,6 +355,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
show_progress = arg;
continue;
}
+ if (object_filter_hand_parse_arg(&filter_options, arg, 1, 1)) {
+ if (!revs.blob_objects)
+ die(_("object filtering requires --objects"));
+ if (filter_options.use_sparse &&
+ !oidcmp(&filter_options.sparse_oid, &null_oid))
+ die(_("invalid sparse value"));
+ continue;
+ }
usage(rev_list_usage);
}
@@ -353,6 +387,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
if (revs.show_notes)
die(_("rev-list does not support display of notes"));
+ if (object_filter_enabled(&filter_options)) {
+ if (use_bitmap_index)
+ die(_("cannot combine --use-bitmap-index with object filtering"));
+ }
+
save_commit_buffer = (revs.verbose_header ||
revs.grep_filter.pattern_list ||
revs.grep_filter.header_list);
@@ -397,7 +436,22 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
return show_bisect_vars(&info, reaches, all);
}
- traverse_commit_list(&revs, show_commit, show_object, &info);
+ if (filter_options.print_manifest)
+ fn_filter_print = print_omitted_object;
+
+ if (filter_options.omit_all_blobs)
+ traverse_commit_list_omit_all_blobs(
+ &revs, show_commit, show_object, fn_filter_print, &info);
+ else if (filter_options.omit_large_blobs)
+ traverse_commit_list_omit_large_blobs(
+ &revs, show_commit, show_object, fn_filter_print, &info,
+ (int64_t)(uint64_t)filter_options.large_byte_limit);
+ else if (filter_options.use_sparse)
+ traverse_commit_list_use_sparse(
+ &revs, show_commit, show_object, fn_filter_print, &info,
+ &filter_options.sparse_oid);
+ else
+ traverse_commit_list(&revs, show_commit, show_object, &info);
stop_progress(&progress);
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 09/19] rev-list: add filtering help text
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (7 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
Documentation/git-rev-list.txt | 7 ++++++-
Documentation/rev-list-options.txt | 26 ++++++++++++++++++++++++++
2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index ef22f17..d20c2ab 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -47,7 +47,12 @@ SYNOPSIS
[ --fixed-strings | -F ]
[ --date=<format>]
[ [ --objects | --objects-edge | --objects-edge-aggressive ]
- [ --unpacked ] ]
+ [ --unpacked ]
+ [ [ --filter-omit-all-blobs |
+ --filter-omit-large-blobs=<n>[kmg] |
+ --filter-use-sparse=<object> ]
+ [ --filter-print-manifest ] ] ]
+ [ --filter-relax ]
[ --pretty | --header ]
[ --bisect ]
[ --bisect-vars ]
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a02f732..e0112dd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -693,6 +693,32 @@ ifdef::git-rev-list[]
--unpacked::
Only useful with `--objects`; print the object IDs that are not
in packs.
+
+--filter-omit-all-blobs::
+ Only useful with one of the `--objects*`; omits all blobs from
+ the printed list of objects.
+
+--filter-omit-large-blobs=<n>[kmg]::
+ Only useful with one of the `--objects*`; omits blobs larger than
+ n bytes from the printed list of objects. May optionally be
+ followed by 'k', 'm', or 'g' units. Value may be zero. Special
+ files (matching ".git*") are always included, regardless of size.
+
+--filter-use-sparse=<object>::
+ Only useful with one of the `--objects*`; uses a sparse-checkout
+ specification contained in the given object to filter the result
+ by omitting blobs that would not be used by the corresponding
+ sparse checkout.
+
+--filter-print-manifest::
+ Only useful with one of the above `--filter*`; prints a manifest
+ of the omitted objects. Object IDs are prefixed with a ``~''
+ character. The object size is printed after the ID.
+
+--filter-relax::
+ Relax consistency checking for missing blobs. Do not warn of
+ missing blobs during normal (non-filtering) object traversal
+ following an earlier partial/narrow clone or fetch.
endif::git-rev-list[]
--no-walk[=(sorted|unsorted)]::
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 10/19] t6112: rev-list object filtering test
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (8 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
t/t6112-rev-list-filters-objects.sh | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
create mode 100644 t/t6112-rev-list-filters-objects.sh
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
new file mode 100644
index 0000000..ded2b04
--- /dev/null
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='git rev-list with object filtering'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+ for n in 1 2 3 4 5 ; do \
+ echo $n > file.$n ; \
+ git add file.$n ; \
+ git commit -m "$n" ; \
+ done
+'
+
+test_expect_success 'omit-all-blobs omitted 5 blobs' '
+ git rev-list HEAD --objects --filter-print-manifest --filter-omit-all-blobs >omit_all &&
+ grep "^~" omit_all >omitted &&
+ test $(cat omitted | wc -l) = 5
+'
+
+test_expect_success 'omit-all-blobs blob sha match' '
+ git rev-list HEAD --objects >normal &&
+ awk "/file/ {print \$1;}" <normal | sort >normal_sha &&
+ sed "s/~//" <omitted | awk "{print \$1;}" | sort >omit_all_sha &&
+ test_cmp normal_sha omit_all_sha
+'
+
+test_expect_success 'omit-all-blobs nothing else changed' '
+ grep -v "file" <normal | sort >normal_other &&
+ grep -v "~" <omit_all | sort >omit_other &&
+ test_cmp normal_other omit_other
+'
+
+# TODO test filter-omit-large-blobs
+# TODO test filter-use-sparse
+
+test_done
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 11/19] pack-objects: add object filtering support
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (9 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach pack-objects to use filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.
This feature is intended for narrow/partial clone/fetch.
Filtering requires use of "--stdout" option.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/pack-objects.c | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 50e01aa..614ad60 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,6 +14,8 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
+#include "list-objects-filters.h"
+#include "object-filter.h"
#include "pack-objects.h"
#include "progress.h"
#include "refs.h"
@@ -77,6 +79,8 @@ static unsigned long cache_max_small_delta_size = 1000;
static unsigned long window_memory_limit = 0;
+static struct object_filter_options filter_options;
+
/*
* stats
*/
@@ -2800,7 +2804,20 @@ static void get_object_list(int ac, const char **av)
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
- traverse_commit_list(&revs, show_commit, show_object, NULL);
+
+ if (filter_options.omit_all_blobs)
+ traverse_commit_list_omit_all_blobs(
+ &revs, show_commit, show_object, NULL, NULL);
+ else if (filter_options.omit_large_blobs)
+ traverse_commit_list_omit_large_blobs(
+ &revs, show_commit, show_object, NULL, NULL,
+ (int64_t)(uint64_t)filter_options.large_byte_limit);
+ else if (filter_options.use_sparse)
+ traverse_commit_list_use_sparse(
+ &revs, show_commit, show_object, NULL, NULL,
+ &filter_options.sparse_oid);
+ else
+ traverse_commit_list(&revs, show_commit, show_object, NULL);
if (unpack_unreachable_expiration) {
revs.ignore_missing_links = 1;
@@ -2936,6 +2953,14 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
N_("use a bitmap index if available to speed up counting objects")),
OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
N_("write a bitmap index together with the pack index")),
+
+ OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+ OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+ OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+ /* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+ /* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
OPT_END(),
};
@@ -3007,6 +3032,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (!pack_to_stdout && thin)
die("--thin cannot be used to build an indexable pack.");
+ if (!pack_to_stdout && object_filter_enabled(&filter_options))
+ die("object filtering cannot be used when building an indexable pack.");
+
if (keep_unreachable && unpack_unreachable)
die("--keep-unreachable and --unpack-unreachable are incompatible.");
if (!rev_list_all || !rev_list_reflog || !rev_list_index)
@@ -3031,6 +3059,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow())
use_bitmap_index = 0;
+ if (object_filter_enabled(&filter_options))
+ use_bitmap_index = 0;
+
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 12/19] pack-objects: add filtering help text
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (10 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Update pack-objects help text to describe object filtering.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
Documentation/git-pack-objects.txt | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 8973510..084641f 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -231,6 +231,20 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
With this option, parents that are hidden by grafts are packed
nevertheless.
+--filter-omit-all-blobs::
+ Omits all blobs from the packfile. This option requires --stdout.
+
+--filter-omit-large-blobs=<n>[kmg]::
+ Omits blobs larger than n bytes from packfile. May optionally be
+ followed by 'k', 'm', or 'g' units. Value may be zero. Special
+ files (matching ".git*") are always included, regardless of size.
+ This option requires --stdout.
+
+--filter-use-sparse=<object>::
+ Uses a sparse-checkout specification given by <object> to filter
+ the result by omitting blobs that would not be used by the
+ corresponding sparse checkout. This option requires --stdout.
+
SEE ALSO
--------
linkgit:git-rev-list[1]
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (11 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
Documentation/technical/pack-protocol.txt | 16 ++++++++++++++++
Documentation/technical/protocol-capabilities.txt | 7 +++++++
2 files changed, 23 insertions(+)
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index a349171..dce6e04 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
upload-request = want-list
*shallow-line
*1depth-request
+ [filter-request]
flush-pkt
want-list = first-want
@@ -226,7 +227,13 @@ out of what the server said it could do with the first 'want' line.
first-want = PKT-LINE("want" SP obj-id SP capability-list)
additional-want = PKT-LINE("want" SP obj-id)
+ filter-request = PKT-LINE("filter-omit-all-blobs") /
+ PKT-LINE("filter-omit-large-blobs" SP magnitude) /
+ PKT-LINE("filter-use-sparse" SP obj-id)
+
depth = 1*DIGIT
+
+ magnitude = 1*DIGIT [ "k" | "m" | "g" ]
----
Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +256,15 @@ complete those commits. Commits whose parents are not received as a
result are defined as shallow and marked as such in the server. This
information is sent back to the client in the next step.
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial/narrow clone/fetch operations.
+"filter-omit-all-blobs" requests that all blobs be omitted from
+the packfile. "filter-omit-large-blobs" requests that blobs larger
+than the requested size be omitted, unless they have a ".git*"
+special filename. "filter-use-sparse" requests blob filtering based
+upon a sparse-checkout specification in the given blob id.
+
Once all the 'want's and 'shallow's (and optional 'deepen') are
transferred, clients MUST send a flush-pkt, to tell the server side
that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..7011eb3 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the <nonce> to be
included in the push certificate. A send-pack client MUST NOT
send a push-cert packet unless the receive-pack server advertises
this capability.
+
+filter-objects
+--------------
+
+If the upload-pack server advertises the 'filter-objects' capability,
+fetch-pack may send "filter-*" commands to request a partial/narrow
+clone/fetch where the server omits various objects from the packfile.
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 14/19] upload-pack: add object filtering
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (12 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
upload-pack.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/upload-pack.c b/upload-pack.c
index ffb028d..c709054 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -17,6 +17,7 @@
#include "parse-options.h"
#include "argv-array.h"
#include "prio-queue.h"
+#include "object-filter.h"
static const char * const upload_pack_usage[] = {
N_("git upload-pack [<options>] <dir>"),
@@ -63,6 +64,9 @@ static int advertise_refs;
static int stateless_rpc;
static const char *pack_objects_hook;
+static int capability_filter_objects_requested;
+static struct object_filter_options filter_options;
+
static void reset_timeout(void)
{
alarm(timeout);
@@ -131,6 +135,30 @@ static void create_pack_file(void)
if (use_include_tag)
argv_array_push(&pack_objects.args, "--include-tag");
+ if (filter_options.omit_all_blobs)
+ argv_array_push(&pack_objects.args,
+ ("--" CL_ARG_FILTER_OMIT_ALL_BLOBS));
+ else if (filter_options.omit_large_blobs) {
+ if (filter_options.large_byte_limit_string)
+ argv_array_pushf(&pack_objects.args, "--%s=%s",
+ CL_ARG_FILTER_OMIT_LARGE_BLOBS,
+ filter_options.large_byte_limit_string);
+ else
+ argv_array_pushf(&pack_objects.args, "--%s=%ld",
+ CL_ARG_FILTER_OMIT_LARGE_BLOBS,
+ filter_options.large_byte_limit);
+ }
+ else if (filter_options.use_sparse) {
+ if (!oidcmp(&filter_options.sparse_oid, &null_oid))
+ argv_array_pushf(&pack_objects.args, "--%s=%s",
+ CL_ARG_FILTER_USE_SPARSE,
+ oid_to_hex(&filter_options.sparse_oid));
+ else
+ argv_array_pushf(&pack_objects.args, "--%s=%s",
+ CL_ARG_FILTER_USE_SPARSE,
+ filter_options.sparse_value);
+ }
+
pack_objects.in = -1;
pack_objects.out = -1;
pack_objects.err = -1;
@@ -793,6 +821,12 @@ static void receive_needs(void)
deepen_rev_list = 1;
continue;
}
+ if (object_filter_hand_parse_protocol(&filter_options, line, 0, 0)) {
+ if (!capability_filter_objects_requested)
+ die("git upload-pack: object filtering requires '%s' capability",
+ PROTOCOL_CAPABILITY_FILTER_OBJECTS);
+ continue;
+ }
if (!skip_prefix(line, "want ", &arg) ||
get_sha1_hex(arg, sha1_buf))
die("git upload-pack: protocol error, "
@@ -820,6 +854,8 @@ static void receive_needs(void)
no_progress = 1;
if (parse_feature_request(features, "include-tag"))
use_include_tag = 1;
+ if (parse_feature_request(features, PROTOCOL_CAPABILITY_FILTER_OBJECTS))
+ capability_filter_objects_requested = 1;
o = parse_object(sha1_buf);
if (!o) {
@@ -928,7 +964,8 @@ static int send_ref(const char *refname, const struct object_id *oid,
{
static const char *capabilities = "multi_ack thin-pack side-band"
" side-band-64k ofs-delta shallow deepen-since deepen-not"
- " deepen-relative no-progress include-tag multi_ack_detailed";
+ " deepen-relative no-progress include-tag multi_ack_detailed"
+ " " PROTOCOL_CAPABILITY_FILTER_OBJECTS;
const char *refname_nons = strip_namespace(refname);
struct object_id peeled;
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 15/19] fetch-pack: add object filtering support
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (13 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/fetch-pack.c | 3 +++
fetch-pack.c | 28 ++++++++++++++++++++++++++++
fetch-pack.h | 2 ++
transport.c | 27 +++++++++++++++++++++++++++
transport.h | 8 ++++++++
5 files changed, 68 insertions(+)
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d1..72f9063 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
args.update_shallow = 1;
continue;
}
+ if (object_filter_hand_parse_arg(&args.filter_options, arg, 0, 0)) {
+ continue;
+ }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b05..642077d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -374,6 +374,8 @@ static int find_common(struct fetch_pack_args *args,
if (prefer_ofs_delta) strbuf_addstr(&c, " ofs-delta");
if (deepen_since_ok) strbuf_addstr(&c, " deepen-since");
if (deepen_not_ok) strbuf_addstr(&c, " deepen-not");
+ if (object_filter_enabled(&args->filter_options))
+ strbuf_addstr(&c, (" " PROTOCOL_CAPABILITY_FILTER_OBJECTS));
if (agent_supported) strbuf_addf(&c, " agent=%s",
git_user_agent_sanitized());
packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
@@ -404,6 +406,18 @@ static int find_common(struct fetch_pack_args *args,
packet_buf_write(&req_buf, "deepen-not %s", s->string);
}
}
+
+ if (args->filter_options.omit_all_blobs)
+ packet_buf_write(&req_buf, PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS);
+ else if (args->filter_options.omit_large_blobs)
+ packet_buf_write(&req_buf,
+ PROTOCOL_REQUEST_FILTER_OMIT_ALL_BLOBS " %ld",
+ args->filter_options.large_byte_limit);
+ else if (args->filter_options.use_sparse)
+ packet_buf_write(&req_buf,
+ PROTOCOL_REQUEST_FILTER_USE_SPARSE " %s",
+ args->filter_options.sparse_value);
+
packet_buf_flush(&req_buf);
state_len = req_buf.len;
@@ -811,6 +825,15 @@ static int get_pack(struct fetch_pack_args *args,
"--keep=fetch-pack %"PRIuMAX " on %s",
(uintmax_t)getpid(), hostname);
}
+
+ /*
+ * Relax consistency check to allow missing blobs (presumably
+ * because they are exactly the set that we requested be
+ * omitted.
+ */
+ if (object_filter_enabled(&args->filter_options))
+ argv_array_push(&cmd.args, ("--" CL_ARG_FILTER_RELAX));
+
if (args->check_self_contained_and_connected)
argv_array_push(&cmd.args, "--check-self-contained-and-connected");
}
@@ -924,6 +947,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
else
prefer_ofs_delta = 0;
+ if (server_supports(PROTOCOL_CAPABILITY_FILTER_OBJECTS))
+ print_verbose(args, _("Server supports "PROTOCOL_CAPABILITY_FILTER_OBJECTS));
+ else if (object_filter_enabled(&args->filter_options))
+ die(_("Server does not support "PROTOCOL_CAPABILITY_FILTER_OBJECTS));
+
if ((agent_feature = server_feature_value("agent", &agent_len))) {
agent_supported = 1;
if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index b6aeb43..5e6bf3b 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
#include "string-list.h"
#include "run-command.h"
+#include "object-filter.h"
struct oid_array;
@@ -12,6 +13,7 @@ struct fetch_pack_args {
int depth;
const char *deepen_since;
const struct string_list *deepen_not;
+ struct object_filter_options filter_options;
unsigned deepen_relative:1;
unsigned quiet:1;
unsigned keep_pack:1;
diff --git a/transport.c b/transport.c
index 4d33138..7abf0b6 100644
--- a/transport.c
+++ b/transport.c
@@ -160,6 +160,32 @@ static int set_git_option(struct git_transport_options *opts,
} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
opts->deepen_relative = !!value;
return 0;
+ } else if (!strcmp(name, TRANS_OPT_FILTER_OMIT_ALL_BLOBS)) {
+ opts->filter_options.omit_all_blobs = !!value;
+ return 0;
+ } else if (!strcmp(name, TRANS_OPT_FILTER_OMIT_LARGE_BLOBS)) {
+ opts->filter_options.omit_large_blobs = 1;
+ opts->filter_options.large_byte_limit_string = value;
+ if (!value)
+ opts->filter_options.large_byte_limit = 0;
+ else if (!git_parse_ulong(value,
+ &opts->filter_options.large_byte_limit))
+ die(_("transport: invalid filter value '%s'"), value);
+ return 0;
+ } else if (!strcmp(name, TRANS_OPT_FILTER_USE_SPARSE)) {
+ opts->filter_options.use_sparse = 1;
+ opts->filter_options.sparse_value = value;
+ /*
+ * We're constrained by the API for this set_ operation and
+ * only take a single value. We don't want to do the get_sha1*()
+ * lookup (possibly for the second time), because the caller
+ * should already know and normalized the hex OID string
+ * (assuming that it used the normal parsing methods). So we
+ * assume that the above string value is sufficient here and
+ * can just NULL the binary OID field.
+ */
+ oidcpy(&opts->filter_options.sparse_oid, &null_oid);
+ return 0;
}
return 1;
}
@@ -228,6 +254,7 @@ static int fetch_refs_via_pack(struct transport *transport,
data->options.check_self_contained_and_connected;
args.cloning = transport->cloning;
args.update_shallow = data->options.update_shallow;
+ args.filter_options = data->options.filter_options;
if (!data->got_remote_heads) {
connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index bc55715..490f827 100644
--- a/transport.h
+++ b/transport.h
@@ -4,6 +4,8 @@
#include "cache.h"
#include "run-command.h"
#include "remote.h"
+#include "fetch-pack.h"
+#include "object-filter.h"
struct string_list;
@@ -21,6 +23,7 @@ struct git_transport_options {
const char *uploadpack;
const char *receivepack;
struct push_cas_option *cas;
+ struct object_filter_options filter_options;
};
enum transport_family {
@@ -210,6 +213,11 @@ void transport_check_allowed(const char *type);
/* Send push certificates */
#define TRANS_OPT_PUSH_CERT "pushcert"
+/* See Documentation/technical/pack-protocol.txt */
+#define TRANS_OPT_FILTER_OMIT_ALL_BLOBS "filter-omit-all-blobs"
+#define TRANS_OPT_FILTER_OMIT_LARGE_BLOBS "filter-omit-large-blobs"
+#define TRANS_OPT_FILTER_USE_SPARSE "filter-use-sparse"
+
/**
* Returns 0 if the option was used, non-zero otherwise. Prints a
* message to stderr if the option is not used.
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 16/19] connected: add filter_allow_omitted option to API
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (14 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
connected.c | 3 +++
connected.h | 6 ++++++
2 files changed, 9 insertions(+)
diff --git a/connected.c b/connected.c
index 136c2ac..c25b816 100644
--- a/connected.c
+++ b/connected.c
@@ -62,6 +62,9 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
argv_array_pushf(&rev_list.args, "--progress=%s",
_("Checking connectivity"));
+ if (opt->filter_relax)
+ argv_array_push(&rev_list.args, ("--" CL_ARG_FILTER_RELAX));
+
rev_list.git_cmd = 1;
rev_list.env = opt->env;
rev_list.in = -1;
diff --git a/connected.h b/connected.h
index 4ca325f..370710e 100644
--- a/connected.h
+++ b/connected.h
@@ -35,6 +35,12 @@ struct check_connected_options {
int progress;
/*
+ * Relax consistency checks for missing blobs (presumably
+ * due to the use 'filter-objects' feature).
+ */
+ int filter_relax;
+
+ /*
* Insert these variables into the environment of the child process.
*/
const char **env;
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 17/19] clone: add filter arguments
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (15 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/clone.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/builtin/clone.c b/builtin/clone.c
index a6ae7d6..1408396 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -24,6 +24,7 @@
#include "remote.h"
#include "run-command.h"
#include "connected.h"
+#include "object-filter.h"
/*
* Overall FIXMEs:
@@ -57,6 +58,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
static int option_dissociate;
static int max_jobs = -1;
static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct object_filter_options filter_options;
static int recurse_submodules_cb(const struct option *opt,
const char *arg, int unset)
@@ -130,6 +132,14 @@ static struct option builtin_clone_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+
+ OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+ OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+ OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+ /* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+ /* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
OPT_END()
};
@@ -643,6 +653,13 @@ static void update_remote_refs(const struct ref *refs,
if (check_connectivity) {
struct check_connected_options opt = CHECK_CONNECTED_INIT;
+ /*
+ * Relax consistency check to allow missing blobs (presumably
+ * because they are exactly the set that we requested be
+ * omitted.
+ */
+ opt.filter_relax = object_filter_enabled(&filter_options);
+
opt.transport = transport;
opt.progress = transport->progress;
@@ -1059,6 +1076,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
warning(_("--shallow-since is ignored in local clones; use file:// instead."));
if (option_not.nr)
warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
+ if (object_filter_enabled(&filter_options))
+ warning(_("--filter-* options are ignored in local clones; use file:// instead."));
if (!access(mkpath("%s/shallow", path), F_OK)) {
if (option_local > 0)
warning(_("source repository is shallow, ignoring --local"));
@@ -1090,6 +1109,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
transport_set_option(transport, TRANS_OPT_UPLOADPACK,
option_upload_pack);
+ if (filter_options.omit_all_blobs)
+ transport_set_option(transport, TRANS_OPT_FILTER_OMIT_ALL_BLOBS, "1");
+ if (filter_options.omit_large_blobs)
+ transport_set_option(transport, TRANS_OPT_FILTER_OMIT_LARGE_BLOBS,
+ filter_options.large_byte_limit_string);
+ if (filter_options.use_sparse)
+ transport_set_option(transport, TRANS_OPT_FILTER_USE_SPARSE,
+ filter_options.sparse_value);
+
if (transport->smart_options && !deepen)
transport->smart_options->check_self_contained_and_connected = 1;
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (16 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/index-pack.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4ff567d..30ff409 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -11,6 +11,7 @@
#include "exec_cmd.h"
#include "streaming.h"
#include "thread-utils.h"
+#include "object-filter.h"
static const char index_pack_usage[] =
"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -80,6 +81,7 @@ static int verbose;
static int show_resolving_progress;
static int show_stat;
static int check_self_contained_and_connected;
+static int filter_relax;
static struct progress *progress;
@@ -220,6 +222,17 @@ static unsigned check_object(struct object *obj)
if (!(obj->flags & FLAG_CHECKED)) {
unsigned long size;
int type = sha1_object_info(obj->oid.hash, &size);
+
+ if (type <= 0 && filter_relax) {
+ /*
+ * Relax consistency checks to not complain about
+ * omitted objects (presumably caused by use of
+ * the 'filter-objects' feature).
+ */
+ obj->flags |= FLAG_CHECKED;
+ return 0;
+ }
+
if (type <= 0)
die(_("did not receive expected object %s"),
oid_to_hex(&obj->oid));
@@ -1721,6 +1734,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
die(_("bad %s"), arg);
} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
max_input_size = strtoumax(arg, NULL, 10);
+ } else if (!strcmp(arg, ("--"CL_ARG_FILTER_RELAX))) {
+ filter_relax = 1;
} else
usage(index_pack_usage);
continue;
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 19/19] fetch: add object filtering to fetch
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
` (17 preceding siblings ...)
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
@ 2017-07-13 17:34 ` Jeff Hostetler
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Hostetler @ 2017-07-13 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, peff, ethomson, jonathantanmy, jrnieder, jeffhost
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/fetch.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5f2c2ab..306c165 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -16,6 +16,7 @@
#include "connected.h"
#include "argv-array.h"
#include "utf8.h"
+#include "object-filter.h"
static const char * const builtin_fetch_usage[] = {
N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -52,6 +53,7 @@ static const char *recurse_submodules_default;
static int shown_url = 0;
static int refmap_alloc, refmap_nr;
static const char **refmap_array;
+static struct object_filter_options filter_options;
static int option_parse_recurse_submodules(const struct option *opt,
const char *arg, int unset)
@@ -141,6 +143,14 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+
+ OPT_PARSE_FILTER_OMIT_ALL_BLOBS(&filter_options),
+ OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(&filter_options),
+ OPT_PARSE_FILTER_USE_SPARSE(&filter_options),
+
+ /* OPT_PARSE_FILTER_PRINT_MANIFEST(&filter_options), */
+ /* OPT_PARSE_FILTER_RELAX(&filter_options), */
+
OPT_END()
};
@@ -733,6 +743,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
int want_status;
int summary_width = transport_summary_width(ref_map);
+ struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+ /*
+ * Relax consistency check to allow missing blobs (presumably
+ * because they are exactly the set that we requested be
+ * omitted.
+ */
+ opt.filter_relax = object_filter_enabled(&filter_options);
fp = fopen(filename, "a");
if (!fp)
@@ -744,7 +762,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
url = xstrdup("foreign");
rm = ref_map;
- if (check_connected(iterate_ref_map, &rm, NULL)) {
+ if (check_connected(iterate_ref_map, &rm, &opt)) {
rc = error(_("%s did not send all necessary objects\n"), url);
goto abort;
}
@@ -885,6 +903,13 @@ static int quickfetch(struct ref *ref_map)
struct check_connected_options opt = CHECK_CONNECTED_INIT;
/*
+ * Relax consistency check to allow missing blobs (presumably
+ * because they are exactly the set that we requested be
+ * omitted.
+ */
+ opt.filter_relax = object_filter_enabled(&filter_options);
+
+ /*
* If we are deepening a shallow clone we already have these
* objects reachable. Running rev-list here will return with
* a good (0) exit status and we'll bypass the fetch that we
--
2.9.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-07-13 17:37 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).