* [PATCH v4 1/8] fetch-pack: refactor packet writing
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
@ 2022-05-02 17:08 ` Calvin Wan
2022-05-02 17:08 ` [PATCH v4 2/8] fetch-pack: move fetch default settings Calvin Wan
` (15 subsequent siblings)
16 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:08 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
A subsequent patch will need to write capabilities for another command,
so refactor write_fetch_command_and_capabilities() to be used by both
fetch and future command.
---
fetch-pack.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 4e1e88eea0..e06125c90a 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1246,13 +1246,13 @@ static int add_haves(struct fetch_negotiator *negotiator,
return haves_added;
}
-static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options)
+static void write_command_and_capabilities(struct strbuf *req_buf,
+ const struct string_list *server_options, const char* command)
{
const char *hash_name;
- if (server_supports_v2("fetch", 1))
- packet_buf_write(req_buf, "command=fetch");
+ if (server_supports_v2(command, 1))
+ packet_buf_write(req_buf, "command=%s", command);
if (server_supports_v2("agent", 0))
packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
if (advertise_sid && server_supports_v2("session-id", 0))
@@ -1288,7 +1288,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
int done_sent = 0;
struct strbuf req_buf = STRBUF_INIT;
- write_fetch_command_and_capabilities(&req_buf, args->server_options);
+ write_command_and_capabilities(&req_buf, args->server_options, "fetch");
if (args->use_thin_pack)
packet_buf_write(&req_buf, "thin-pack");
@@ -2072,7 +2072,7 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
int received_ready = 0;
strbuf_reset(&req_buf);
- write_fetch_command_and_capabilities(&req_buf, server_options);
+ write_command_and_capabilities(&req_buf, server_options, "fetch");
packet_buf_write(&req_buf, "wait-for-done");
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* [PATCH v4 2/8] fetch-pack: move fetch default settings
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-02 17:08 ` [PATCH v4 1/8] fetch-pack: refactor packet writing Calvin Wan
@ 2022-05-02 17:08 ` Calvin Wan
2022-05-02 22:58 ` Junio C Hamano
2022-05-02 17:08 ` [PATCH v4 3/8] object-store: add function to free object_info contents Calvin Wan
` (14 subsequent siblings)
16 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:08 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
When the state machine in do_fetch_pack_v2() is in FETCH_CHECK_LOCAL, we
set a few v2-specific defaults. It will be helpful for a future patch to
have these defaults set if the initial state is not FETCH_CHECK_LOCAL.
This is a safe change since the initial state is currently always
FETCH_CHECK_LOCAL, so we're guaranteed to hit that code whether it's in
the state machine or not.
---
fetch-pack.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index e06125c90a..45473b4602 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1610,18 +1610,18 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
reader.me = "fetch-pack";
}
+ /* v2 supports these by default */
+ allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
+ use_sideband = 2;
+ if (args->depth > 0 || args->deepen_since || args->deepen_not)
+ args->deepen = 1;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
sort_ref_list(&ref, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
- /* v2 supports these by default */
- allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
- use_sideband = 2;
- if (args->depth > 0 || args->deepen_since || args->deepen_not)
- args->deepen = 1;
-
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 2/8] fetch-pack: move fetch default settings
2022-05-02 17:08 ` [PATCH v4 2/8] fetch-pack: move fetch default settings Calvin Wan
@ 2022-05-02 22:58 ` Junio C Hamano
2022-05-03 23:06 ` Jonathan Tan
0 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-05-02 22:58 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> When the state machine in do_fetch_pack_v2() is in FETCH_CHECK_LOCAL, we
> set a few v2-specific defaults. It will be helpful for a future patch to
> have these defaults set if the initial state is not FETCH_CHECK_LOCAL.
> This is a safe change since the initial state is currently always
> FETCH_CHECK_LOCAL, so we're guaranteed to hit that code whether it's in
> the state machine or not.
What does "it" (which supposed to be able to be in the state machine
and also not to be in the state matchine) in the last sentence refer
to?
>
> ---
Missing sign-off.
The patch looks correct and I agree that this is a benign no-op
because the initial value of state is FETCH_CHECK_LOCAL (i.e. the
block of code moved here will execute pretty much as the first
thing, and the relative order between that block and sorting of ref
list should not matter). I just didn't understand the explanation
given by the patch why it is safe.
Thanks.
> fetch-pack.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index e06125c90a..45473b4602 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1610,18 +1610,18 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> reader.me = "fetch-pack";
> }
>
> + /* v2 supports these by default */
> + allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
> + use_sideband = 2;
> + if (args->depth > 0 || args->deepen_since || args->deepen_not)
> + args->deepen = 1;
> +
> while (state != FETCH_DONE) {
> switch (state) {
> case FETCH_CHECK_LOCAL:
> sort_ref_list(&ref, ref_compare_name);
> QSORT(sought, nr_sought, cmp_ref_by_name);
>
> - /* v2 supports these by default */
> - allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
> - use_sideband = 2;
> - if (args->depth > 0 || args->deepen_since || args->deepen_not)
> - args->deepen = 1;
> -
> /* Filter 'ref' by 'sought' and those that aren't local */
> mark_complete_and_common_ref(negotiator, args, &ref);
> filter_refs(args, &ref, sought, nr_sought);
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 2/8] fetch-pack: move fetch default settings
2022-05-02 22:58 ` Junio C Hamano
@ 2022-05-03 23:06 ` Jonathan Tan
2022-05-05 18:13 ` Calvin Wan
0 siblings, 1 reply; 103+ messages in thread
From: Jonathan Tan @ 2022-05-03 23:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Tan, Calvin Wan, git, philipoakley, johncai86, me
Junio C Hamano <gitster@pobox.com> writes:
> Calvin Wan <calvinwan@google.com> writes:
>
> > When the state machine in do_fetch_pack_v2() is in FETCH_CHECK_LOCAL, we
> > set a few v2-specific defaults. It will be helpful for a future patch to
> > have these defaults set if the initial state is not FETCH_CHECK_LOCAL.
>
>
> > This is a safe change since the initial state is currently always
> > FETCH_CHECK_LOCAL, so we're guaranteed to hit that code whether it's in
> > the state machine or not.
>
> What does "it" (which supposed to be able to be in the state machine
> and also not to be in the state matchine) in the last sentence refer
> to?
I think the "it" refers to the initialization code that was moved.
> The patch looks correct and I agree that this is a benign no-op
> because the initial value of state is FETCH_CHECK_LOCAL (i.e. the
> block of code moved here will execute pretty much as the first
> thing, and the relative order between that block and sorting of ref
> list should not matter). I just didn't understand the explanation
> given by the patch why it is safe.
>
> Thanks.
Agreed - I think the commit message would be better off if it was worded
something like:
There are some initialization steps that need to be done at the start
of the execution of the do_fetch_pack_v2() state machine. Currently,
these are done in the FETCH_CHECK_LOCAL state, which is the initial
state.
However, a subsequent patch will allow for another initial state,
while still requiring these initialization steps. Therefore, move
these initialization steps to before the state machine, so that they
are run regardless of the initial state.
I think this description suffices for a reviewer to see that it is safe,
but if you want, you can add:
Note that there is no change in behavior, because we're moving code
from the beginning of the first state to just before the execution of
the state machine.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 2/8] fetch-pack: move fetch default settings
2022-05-03 23:06 ` Jonathan Tan
@ 2022-05-05 18:13 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-05 18:13 UTC (permalink / raw)
To: Jonathan Tan; +Cc: Junio C Hamano, git, philipoakley, johncai86, me
Done thanks for the suggestions
On Tue, May 3, 2022 at 4:07 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
> > Calvin Wan <calvinwan@google.com> writes:
> >
> > > When the state machine in do_fetch_pack_v2() is in FETCH_CHECK_LOCAL, we
> > > set a few v2-specific defaults. It will be helpful for a future patch to
> > > have these defaults set if the initial state is not FETCH_CHECK_LOCAL.
> >
> >
> > > This is a safe change since the initial state is currently always
> > > FETCH_CHECK_LOCAL, so we're guaranteed to hit that code whether it's in
> > > the state machine or not.
> >
> > What does "it" (which supposed to be able to be in the state machine
> > and also not to be in the state matchine) in the last sentence refer
> > to?
>
> I think the "it" refers to the initialization code that was moved.
>
> > The patch looks correct and I agree that this is a benign no-op
> > because the initial value of state is FETCH_CHECK_LOCAL (i.e. the
> > block of code moved here will execute pretty much as the first
> > thing, and the relative order between that block and sorting of ref
> > list should not matter). I just didn't understand the explanation
> > given by the patch why it is safe.
> >
> > Thanks.
>
> Agreed - I think the commit message would be better off if it was worded
> something like:
>
> There are some initialization steps that need to be done at the start
> of the execution of the do_fetch_pack_v2() state machine. Currently,
> these are done in the FETCH_CHECK_LOCAL state, which is the initial
> state.
>
> However, a subsequent patch will allow for another initial state,
> while still requiring these initialization steps. Therefore, move
> these initialization steps to before the state machine, so that they
> are run regardless of the initial state.
>
> I think this description suffices for a reviewer to see that it is safe,
> but if you want, you can add:
>
> Note that there is no change in behavior, because we're moving code
> from the beginning of the first state to just before the execution of
> the state machine.
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v4 3/8] object-store: add function to free object_info contents
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-02 17:08 ` [PATCH v4 1/8] fetch-pack: refactor packet writing Calvin Wan
2022-05-02 17:08 ` [PATCH v4 2/8] fetch-pack: move fetch default settings Calvin Wan
@ 2022-05-02 17:08 ` Calvin Wan
2022-05-02 23:23 ` Junio C Hamano
2022-05-02 17:09 ` [PATCH v4 4/8] object-info: send attribute packet regardless of object ids Calvin Wan
` (13 subsequent siblings)
16 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:08 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
Introduce free_object_info_contents.
---
object-file.c | 16 ++++++++++++++++
object-store.h | 3 +++
2 files changed, 19 insertions(+)
diff --git a/object-file.c b/object-file.c
index 5ffbf3d4fd..34a6e34adf 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2659,3 +2659,19 @@ int read_loose_object(const char *path,
munmap(map, mapsize);
return ret;
}
+
+void free_object_info_contents(struct object_info *object_info)
+{
+ if (!object_info)
+ return;
+ if (object_info->typep)
+ free(object_info->typep);
+ if (object_info->sizep)
+ free(object_info->sizep);
+ if (object_info->disk_sizep)
+ free(object_info->disk_sizep);
+ if (object_info->delta_base_oid)
+ free(object_info->delta_base_oid);
+ if (object_info->type_name)
+ free(object_info->type_name);
+}
\ No newline at end of file
diff --git a/object-store.h b/object-store.h
index bd2322ed8c..840e04b56f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -533,4 +533,7 @@ int for_each_object_in_pack(struct packed_git *p,
int for_each_packed_object(each_packed_object_fn, void *,
enum for_each_object_flags flags);
+/* Free pointers inside of object_info, but not object_info itself */
+void free_object_info_contents(struct object_info *object_info);
+
#endif /* OBJECT_STORE_H */
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 3/8] object-store: add function to free object_info contents
2022-05-02 17:08 ` [PATCH v4 3/8] object-store: add function to free object_info contents Calvin Wan
@ 2022-05-02 23:23 ` Junio C Hamano
2022-05-04 19:09 ` Junio C Hamano
2022-05-05 0:15 ` Junio C Hamano
0 siblings, 2 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-05-02 23:23 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> Introduce free_object_info_contents.
>
> ---
> object-file.c | 16 ++++++++++++++++
> object-store.h | 3 +++
> 2 files changed, 19 insertions(+)
>
> diff --git a/object-file.c b/object-file.c
> index 5ffbf3d4fd..34a6e34adf 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2659,3 +2659,19 @@ int read_loose_object(const char *path,
> munmap(map, mapsize);
> return ret;
> }
> +
> +void free_object_info_contents(struct object_info *object_info)
> +{
> + if (!object_info)
> + return;
This is OK, but ...
> + if (object_info->typep)
> + free(object_info->typep);
> + if (object_info->sizep)
> + free(object_info->sizep);
> + if (object_info->disk_sizep)
> + free(object_info->disk_sizep);
> + if (object_info->delta_base_oid)
> + free(object_info->delta_base_oid);
> + if (object_info->type_name)
> + free(object_info->type_name);
if (PTR)
free(PTR);
can and should be written as
free(PTR);
If we are reusing object_info after calling this function, we
_might_ want to use FREE_AND_NULL() instead of free().
> +}
> \ No newline at end of file
Let's avoid such incomplete lines.
> diff --git a/object-store.h b/object-store.h
> index bd2322ed8c..840e04b56f 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -533,4 +533,7 @@ int for_each_object_in_pack(struct packed_git *p,
> int for_each_packed_object(each_packed_object_fn, void *,
> enum for_each_object_flags flags);
>
> +/* Free pointers inside of object_info, but not object_info itself */
> +void free_object_info_contents(struct object_info *object_info);
> +
> #endif /* OBJECT_STORE_H */
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 3/8] object-store: add function to free object_info contents
2022-05-02 23:23 ` Junio C Hamano
@ 2022-05-04 19:09 ` Junio C Hamano
2022-05-05 0:15 ` Junio C Hamano
1 sibling, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-05-04 19:09 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Junio C Hamano <gitster@pobox.com> writes:
>> +}
>> \ No newline at end of file
>
> Let's avoid such incomplete lines.
As this breaks my build cycle ("make sparse" is part of my
post-integration check), I have added this workaround on the tip of
the topic, but please make sure I do not have to redo that when you
reroll.
Thanks.
Subject: [PATCH] SQUASH??? compilation fix
"make sparse" rightfully complains that this file ends in an
incomplete line.
---
object-file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index 34a6e34adf..c6addb6969 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2674,4 +2674,4 @@ void free_object_info_contents(struct object_info *object_info)
free(object_info->delta_base_oid);
if (object_info->type_name)
free(object_info->type_name);
-}
\ No newline at end of file
+}
--
2.36.0-244-ge81c54a17b
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 3/8] object-store: add function to free object_info contents
2022-05-02 23:23 ` Junio C Hamano
2022-05-04 19:09 ` Junio C Hamano
@ 2022-05-05 0:15 ` Junio C Hamano
2022-05-05 16:47 ` Calvin Wan
1 sibling, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-05-05 0:15 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Junio C Hamano <gitster@pobox.com> writes:
>> + if (object_info->typep)
>> + free(object_info->typep);
>> + if (object_info->sizep)
>> + free(object_info->sizep);
>> + if (object_info->disk_sizep)
>> + free(object_info->disk_sizep);
>> + if (object_info->delta_base_oid)
>> + free(object_info->delta_base_oid);
>> + if (object_info->type_name)
>> + free(object_info->type_name);
>
> if (PTR)
> free(PTR);
>
> can and should be written as
>
> free(PTR);
>
> If we are reusing object_info after calling this function, we
> _might_ want to use FREE_AND_NULL() instead of free().
> As this breaks my build cycle ("make sparse" is part of my
> post-integration check), I have added this workaround on the tip of
> the topic, but please make sure I do not have to redo that when you
> reroll.
>
> Thanks.
Again, this breaks the build at GitHub CI (static-analysis), I added
this workaround on the tip of the topic, merged it to 'seen' and
pushed the result out, but please make sure I do not have to redo
that when you reroll.
Thanks.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 3/8] object-store: add function to free object_info contents
2022-05-05 0:15 ` Junio C Hamano
@ 2022-05-05 16:47 ` Calvin Wan
2022-05-05 17:01 ` Junio C Hamano
0 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-05-05 16:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Apologies for the mistakes. Getting a script from Josh Steadmon to
make sure these things never happen again :)
On Wed, May 4, 2022 at 5:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> >> + if (object_info->typep)
> >> + free(object_info->typep);
> >> + if (object_info->sizep)
> >> + free(object_info->sizep);
> >> + if (object_info->disk_sizep)
> >> + free(object_info->disk_sizep);
> >> + if (object_info->delta_base_oid)
> >> + free(object_info->delta_base_oid);
> >> + if (object_info->type_name)
> >> + free(object_info->type_name);
> >
> > if (PTR)
> > free(PTR);
> >
> > can and should be written as
> >
> > free(PTR);
> >
> > If we are reusing object_info after calling this function, we
> > _might_ want to use FREE_AND_NULL() instead of free().
>
> > As this breaks my build cycle ("make sparse" is part of my
> > post-integration check), I have added this workaround on the tip of
> > the topic, but please make sure I do not have to redo that when you
> > reroll.
> >
> > Thanks.
>
> Again, this breaks the build at GitHub CI (static-analysis), I added
> this workaround on the tip of the topic, merged it to 'seen' and
> pushed the result out, but please make sure I do not have to redo
> that when you reroll.
>
> Thanks.
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v4 4/8] object-info: send attribute packet regardless of object ids
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (2 preceding siblings ...)
2022-05-02 17:08 ` [PATCH v4 3/8] object-store: add function to free object_info contents Calvin Wan
@ 2022-05-02 17:09 ` Calvin Wan
2022-05-03 0:05 ` Junio C Hamano
2022-05-03 23:11 ` Jonathan Tan
2022-05-02 17:09 ` [PATCH v4 5/8] transport: add client side capability to request object-info Calvin Wan
` (12 subsequent siblings)
16 siblings, 2 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:09 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
Currently on the server side of object-info, if the client does not send
any object ids in the request or if the client requests an attribute the
server does not support, the server will either not send any packets
back or error out. Consider the scenario where the client git version is
ahead of the server git version, and the client can request additional
object-info besides 'size'. There needs to be a way to tell whether the
server can honor all of the client attribute requests before the client
sends a request with all of the object ids. In a future patch, if the
client were to make an initial command request with only attributes, the
server would be able to confirm which attributes it could return.
---
protocol-caps.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/protocol-caps.c b/protocol-caps.c
index bbde91810a..bc7def0727 100644
--- a/protocol-caps.c
+++ b/protocol-caps.c
@@ -11,6 +11,7 @@
struct requested_info {
unsigned size : 1;
+ unsigned unknown : 1;
};
/*
@@ -40,12 +41,12 @@ static void send_info(struct repository *r, struct packet_writer *writer,
struct string_list_item *item;
struct strbuf send_buffer = STRBUF_INIT;
- if (!oid_str_list->nr)
- return;
-
if (info->size)
packet_writer_write(writer, "size");
+ if (info->unknown || !oid_str_list->nr)
+ goto release;
+
for_each_string_list_item (item, oid_str_list) {
const char *oid_str = item->string;
struct object_id oid;
@@ -72,12 +73,13 @@ static void send_info(struct repository *r, struct packet_writer *writer,
packet_writer_write(writer, "%s", send_buffer.buf);
strbuf_reset(&send_buffer);
}
+release:
strbuf_release(&send_buffer);
}
int cap_object_info(struct repository *r, struct packet_reader *request)
{
- struct requested_info info;
+ struct requested_info info = { 0 };
struct packet_writer writer;
struct string_list oid_str_list = STRING_LIST_INIT_DUP;
@@ -92,9 +94,7 @@ int cap_object_info(struct repository *r, struct packet_reader *request)
if (parse_oid(request->line, &oid_str_list))
continue;
- packet_writer_error(&writer,
- "object-info: unexpected line: '%s'",
- request->line);
+ info.unknown = 1;
}
if (request->status != PACKET_READ_FLUSH) {
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 4/8] object-info: send attribute packet regardless of object ids
2022-05-02 17:09 ` [PATCH v4 4/8] object-info: send attribute packet regardless of object ids Calvin Wan
@ 2022-05-03 0:05 ` Junio C Hamano
2022-05-03 19:21 ` Calvin Wan
2022-05-03 23:11 ` Jonathan Tan
1 sibling, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-05-03 0:05 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> Currently on the server side of object-info, if the client does not send
> any object ids in the request or if the client requests an attribute the
> server does not support, the server will either not send any packets
> back or error out.
There is an early return when oid_str_list->nr (i.e. number of
objects we are queried) is zero, and we return without showing the
"size" token in the output (called "attribute"). The change in this
patch changes the behaviour and makes such a request responded with
"size" but without any size information for no objects.
It is unclear why that is a good change, though.
The loop that accepts requests in today's code sends an error when
it sees a line that it does not understand. The patch stops doing
so. Also, after ignoring such an unknown line, the requests that
were understood are responded to by send_info() function. The patch
instead refuses to give any output in such a case.
It is unclear why either of these two makes it a good change,
though.
> Consider the scenario where the client git version is
> ahead of the server git version, and the client can request additional
> object-info besides 'size'. There needs to be a way to tell whether the
> server can honor all of the client attribute requests before the client
> sends a request with all of the object ids.
Yes. If we want to learn about "size", "frotz", and "nitfol"
attributes about these objects, we can send "size", "frotz",
"nitfol", and then start feeding object names. Then we can observe
that "frotz" were not liked to learn that this old server does not
understand it, and the same for "nitfol". At least we would have
learned about size of these objects, without this patch.
Now, an updated server side with this patch would respond with
"size" and no other response. Does it mean it does not understand
"frotz", it does not understand "nitfol", or neither? Did we get no
response because we didn't feed objects, or because we asked for
something they don't know about?
How well does the current client work with today's server side and
the server side updated with this patch? I _think_ this change is
essentially a no-op when there is no version skew (i.e. we do not
ask for anything today's server does not know about and we do not
necessarily ask today's server about zero objects).
Am I correct to assume that those who are talking with today's
server side are all out-of-tree custom clients?
I am wondering how much damage we are causing them with this change
in behaviour, especially with the lack of "You asked for X, but I
don't understand X", which is replaced by no output, which would be
totally unexpected by them.
It totally is possible that this "object-info" request is treated as
an experimental curiosity that is not ready for production and has
no existing users. If this were in 2006, I would just _declare_
such is a case and move on, breaking the protocol for existing users
whose number is zero. But I am afraid that we no longer live in
such a simpler world, so...
> In a future patch, if the
> client were to make an initial command request with only attributes, the
> server would be able to confirm which attributes it could return.
>
> ---
> protocol-caps.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/protocol-caps.c b/protocol-caps.c
> index bbde91810a..bc7def0727 100644
> --- a/protocol-caps.c
> +++ b/protocol-caps.c
> @@ -11,6 +11,7 @@
>
> struct requested_info {
> unsigned size : 1;
> + unsigned unknown : 1;
> };
OK.
> /*
> @@ -40,12 +41,12 @@ static void send_info(struct repository *r, struct packet_writer *writer,
> struct string_list_item *item;
> struct strbuf send_buffer = STRBUF_INIT;
>
> - if (!oid_str_list->nr)
> - return;
> -
> if (info->size)
> packet_writer_write(writer, "size");
>
> + if (info->unknown || !oid_str_list->nr)
> + goto release;
> +
> for_each_string_list_item (item, oid_str_list) {
> const char *oid_str = item->string;
> struct object_id oid;
> @@ -72,12 +73,13 @@ static void send_info(struct repository *r, struct packet_writer *writer,
> packet_writer_write(writer, "%s", send_buffer.buf);
> strbuf_reset(&send_buffer);
> }
> +release:
> strbuf_release(&send_buffer);
> }
OK, except for the bypass for info->unknown case, which I am not
sure about.
> int cap_object_info(struct repository *r, struct packet_reader *request)
> {
> - struct requested_info info;
> + struct requested_info info = { 0 };
Wow. We have been using info uninitialized? Is this a silent
bugfix?
If info.size bit was on due to on-stack garbage, we would have given
our response with "size" attribute prefixed, even when the client
side never requested it.
> @@ -92,9 +94,7 @@ int cap_object_info(struct repository *r, struct packet_reader *request)
> if (parse_oid(request->line, &oid_str_list))
> continue;
>
> - packet_writer_error(&writer,
> - "object-info: unexpected line: '%s'",
> - request->line);
> + info.unknown = 1;
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 4/8] object-info: send attribute packet regardless of object ids
2022-05-03 0:05 ` Junio C Hamano
@ 2022-05-03 19:21 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-03 19:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86, me
> Now, an updated server side with this patch would respond with
> "size" and no other response. Does it mean it does not understand
> "frotz", it does not understand "nitfol", or neither? Did we get no
> response because we didn't feed objects, or because we asked for
> something they don't know about?
Patches 4-6 describe the following design idea I had:
- Send initial request with only attributes
- If server can fulfill object-info attributes, then go ahead
- Else fallback to fetch
I responded to your comments on patch 5 regarding not needing
that first request, and sending the entire request from the beginning
so I imagine v5 would look something like this instead:
- Send full request
- Server either responds with a fulfilled request or only the attributes
it can return.
- Fallback to fetch if request is unfulfilled
> How well does the current client work with today's server side and
> the server side updated with this patch? I _think_ this change is
> essentially a no-op when there is no version skew
You are correct that this change is essentially a no-op when there is
no version skew, which is intended. There is no current client; it's
implemented in patch 5.
> Am I correct to assume that those who are talking with today's
> server side are all out-of-tree custom clients?
> I am wondering how much damage we are causing them with this change
> in behaviour, especially with the lack of "You asked for X, but I
> don't understand X", which is replaced by no output, which would be
> totally unexpected by them.
Internally we don't have anyone using this yet. If we want to be
totally safe, I am also OK with having object-info return everything
it understands about the request rather than nothing. This was
intended to be an optimization since we would be defaulting back
to fetch to get object-info rather than only returning what object-info
knows.
> Wow. We have been using info uninitialized? Is this a silent
> bugfix?
Yes unfortunately.
On Mon, May 2, 2022 at 5:06 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Calvin Wan <calvinwan@google.com> writes:
>
> > Currently on the server side of object-info, if the client does not send
> > any object ids in the request or if the client requests an attribute the
> > server does not support, the server will either not send any packets
> > back or error out.
>
> There is an early return when oid_str_list->nr (i.e. number of
> objects we are queried) is zero, and we return without showing the
> "size" token in the output (called "attribute"). The change in this
> patch changes the behaviour and makes such a request responded with
> "size" but without any size information for no objects.
>
> It is unclear why that is a good change, though.
>
> The loop that accepts requests in today's code sends an error when
> it sees a line that it does not understand. The patch stops doing
> so. Also, after ignoring such an unknown line, the requests that
> were understood are responded to by send_info() function. The patch
> instead refuses to give any output in such a case.
>
> It is unclear why either of these two makes it a good change,
> though.
>
> > Consider the scenario where the client git version is
> > ahead of the server git version, and the client can request additional
> > object-info besides 'size'. There needs to be a way to tell whether the
> > server can honor all of the client attribute requests before the client
> > sends a request with all of the object ids.
>
> Yes. If we want to learn about "size", "frotz", and "nitfol"
> attributes about these objects, we can send "size", "frotz",
> "nitfol", and then start feeding object names. Then we can observe
> that "frotz" were not liked to learn that this old server does not
> understand it, and the same for "nitfol". At least we would have
> learned about size of these objects, without this patch.
>
> Now, an updated server side with this patch would respond with
> "size" and no other response. Does it mean it does not understand
> "frotz", it does not understand "nitfol", or neither? Did we get no
> response because we didn't feed objects, or because we asked for
> something they don't know about?
>
> How well does the current client work with today's server side and
> the server side updated with this patch? I _think_ this change is
> essentially a no-op when there is no version skew (i.e. we do not
> ask for anything today's server does not know about and we do not
> necessarily ask today's server about zero objects).
>
> Am I correct to assume that those who are talking with today's
> server side are all out-of-tree custom clients?
>
> I am wondering how much damage we are causing them with this change
> in behaviour, especially with the lack of "You asked for X, but I
> don't understand X", which is replaced by no output, which would be
> totally unexpected by them.
>
> It totally is possible that this "object-info" request is treated as
> an experimental curiosity that is not ready for production and has
> no existing users. If this were in 2006, I would just _declare_
> such is a case and move on, breaking the protocol for existing users
> whose number is zero. But I am afraid that we no longer live in
> such a simpler world, so...
>
> > In a future patch, if the
> > client were to make an initial command request with only attributes, the
> > server would be able to confirm which attributes it could return.
> >
> > ---
> > protocol-caps.c | 14 +++++++-------
> > 1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/protocol-caps.c b/protocol-caps.c
> > index bbde91810a..bc7def0727 100644
> > --- a/protocol-caps.c
> > +++ b/protocol-caps.c
> > @@ -11,6 +11,7 @@
> >
> > struct requested_info {
> > unsigned size : 1;
> > + unsigned unknown : 1;
> > };
>
> OK.
>
> > /*
> > @@ -40,12 +41,12 @@ static void send_info(struct repository *r, struct packet_writer *writer,
> > struct string_list_item *item;
> > struct strbuf send_buffer = STRBUF_INIT;
> >
> > - if (!oid_str_list->nr)
> > - return;
> > -
> > if (info->size)
> > packet_writer_write(writer, "size");
> >
> > + if (info->unknown || !oid_str_list->nr)
> > + goto release;
> > +
> > for_each_string_list_item (item, oid_str_list) {
> > const char *oid_str = item->string;
> > struct object_id oid;
> > @@ -72,12 +73,13 @@ static void send_info(struct repository *r, struct packet_writer *writer,
> > packet_writer_write(writer, "%s", send_buffer.buf);
> > strbuf_reset(&send_buffer);
> > }
> > +release:
> > strbuf_release(&send_buffer);
> > }
>
> OK, except for the bypass for info->unknown case, which I am not
> sure about.
>
> > int cap_object_info(struct repository *r, struct packet_reader *request)
> > {
> > - struct requested_info info;
> > + struct requested_info info = { 0 };
>
> Wow. We have been using info uninitialized? Is this a silent
> bugfix?
>
> If info.size bit was on due to on-stack garbage, we would have given
> our response with "size" attribute prefixed, even when the client
> side never requested it.
>
> > @@ -92,9 +94,7 @@ int cap_object_info(struct repository *r, struct packet_reader *request)
> > if (parse_oid(request->line, &oid_str_list))
> > continue;
> >
> > - packet_writer_error(&writer,
> > - "object-info: unexpected line: '%s'",
> > - request->line);
> > + info.unknown = 1;
>
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 4/8] object-info: send attribute packet regardless of object ids
2022-05-02 17:09 ` [PATCH v4 4/8] object-info: send attribute packet regardless of object ids Calvin Wan
2022-05-03 0:05 ` Junio C Hamano
@ 2022-05-03 23:11 ` Jonathan Tan
1 sibling, 0 replies; 103+ messages in thread
From: Jonathan Tan @ 2022-05-03 23:11 UTC (permalink / raw)
To: Calvin Wan; +Cc: Jonathan Tan, git, gitster, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> Currently on the server side of object-info, if the client does not send
> any object ids in the request or if the client requests an attribute the
> server does not support, the server will either not send any packets
> back or error out. Consider the scenario where the client git version is
> ahead of the server git version, and the client can request additional
> object-info besides 'size'. There needs to be a way to tell whether the
> server can honor all of the client attribute requests before the client
> sends a request with all of the object ids.
This part explains the problem...
> In a future patch, if the
> client were to make an initial command request with only attributes, the
> server would be able to confirm which attributes it could return.
...and this part explains what we want to do in the future, but this
commit message doesn't explain what this commit does. (The commit
subject has something related, but does not have enough detail. For
example, is it an empty attribute packet?)
Having said that, the way we usually communicate client-server version
differences is through capabilities - is there a reason why we can't use
that here?
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v4 5/8] transport: add client side capability to request object-info
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (3 preceding siblings ...)
2022-05-02 17:09 ` [PATCH v4 4/8] object-info: send attribute packet regardless of object ids Calvin Wan
@ 2022-05-02 17:09 ` Calvin Wan
2022-05-03 0:54 ` Junio C Hamano
2022-05-03 23:15 ` Jonathan Tan
2022-05-02 17:09 ` [PATCH v4 6/8] transport: add object-info fallback to fetch Calvin Wan
` (11 subsequent siblings)
16 siblings, 2 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:09 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
Sometimes it is useful to get information about an object without having
to download it completely. The server logic has already been implemented
as “a2ba162cda (object-info: support for retrieving object info,
2021-04-20)”. This patch implements the client option for it. Currently,
only 'size' is supported and the server must be v2, however future
patches can implement additional options.
The question of version mismatch often comes up with client/server
relationships. There are two cases to consider here (assuming either
client or server functionality for object-info changes between the
different versions):
1. client version > server version
- client can request additional attributes from server
2. server version > client version
- server can return additional info to the client
The second case is a non-issue since the client would never be able to
request additional info from the server. In order to solve the first
case, recall an earlier patch where the server sends back the attributes
even if no object ids are sent by the client. This allows the client to
first check whether the server can accept the requested attributes
before sending the entire request.
---
fetch-pack.c | 23 +++++++++++++
fetch-pack.h | 8 +++++
transport-helper.c | 7 ++--
transport.c | 81 +++++++++++++++++++++++++++++++++++++++++++++-
transport.h | 11 +++++++
5 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 45473b4602..506ca28e05 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1278,6 +1278,29 @@ static void write_command_and_capabilities(struct strbuf *req_buf,
packet_buf_delim(req_buf);
}
+void send_object_info_request(int fd_out, struct object_info_args *args)
+{
+ struct strbuf req_buf = STRBUF_INIT;
+ struct string_list unsorted_object_info_options = *args->object_info_options;
+ size_t i;
+
+ write_command_and_capabilities(&req_buf, args->server_options, "object-info");
+
+ if (unsorted_string_list_has_string(&unsorted_object_info_options, "size"))
+ packet_buf_write(&req_buf, "size");
+
+ if (args->oids) {
+ for (i = 0; i < args->oids->nr; i++)
+ packet_buf_write(&req_buf, "oid %s", oid_to_hex(&args->oids->oid[i]));
+ }
+
+ packet_buf_flush(&req_buf);
+ if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
+ die_errno(_("unable to write request to remote"));
+
+ strbuf_release(&req_buf);
+}
+
static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
struct fetch_pack_args *args,
const struct ref *wants, struct oidset *common,
diff --git a/fetch-pack.h b/fetch-pack.h
index 8c7752fc82..c27018d48c 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -69,6 +69,12 @@ struct fetch_pack_args {
unsigned connectivity_checked:1;
};
+struct object_info_args {
+ const struct string_list *object_info_options;
+ const struct string_list *server_options;
+ const struct oid_array *oids;
+};
+
/*
* sought represents remote references that should be updated from.
* On return, the names that were found on the remote will have been
@@ -102,4 +108,6 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+void send_object_info_request(int fd_out, struct object_info_args *args);
+
#endif
diff --git a/transport-helper.c b/transport-helper.c
index b4dbbabb0c..093946f7fd 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -686,13 +686,16 @@ static int fetch_refs(struct transport *transport,
/*
* If we reach here, then the server, the client, and/or the transport
- * helper does not support protocol v2. --negotiate-only requires
- * protocol v2.
+ * helper does not support protocol v2. --negotiate-only and --object-info
+ * require protocol v2.
*/
if (data->transport_options.acked_commits) {
warning(_("--negotiate-only requires protocol v2"));
return -1;
}
+ if (transport->smart_options->object_info) {
+ die(_("--object-info requires protocol v2"));
+ }
if (!data->get_refs_list_called)
get_refs_list_using_list(transport, 0);
diff --git a/transport.c b/transport.c
index 3d64a43ab3..08c505e1d0 100644
--- a/transport.c
+++ b/transport.c
@@ -353,6 +353,79 @@ static struct ref *handshake(struct transport *transport, int for_push,
return refs;
}
+static int fetch_object_info(struct transport *transport, struct object_info **object_info_data)
+{
+ size_t i;
+ int size_index = -1;
+ struct git_transport_data *data = transport->data;
+ struct object_info_args args;
+ struct packet_reader reader;
+ struct string_list server_attributes = STRING_LIST_INIT_DUP;
+
+ memset(&args, 0, sizeof(args));
+ args.server_options = transport->server_options;
+ args.object_info_options = transport->smart_options->object_info_options;
+
+ connect_setup(transport, 0);
+ packet_reader_init(&reader, data->fd[0], NULL, 0,
+ PACKET_READ_CHOMP_NEWLINE |
+ PACKET_READ_DIE_ON_ERR_PACKET);
+ data->version = discover_version(&reader);
+
+ transport->hash_algo = reader.hash_algo;
+
+ switch (data->version) {
+ case protocol_v2:
+ if (!server_supports_v2("object-info", 0))
+ return -1;
+ /**
+ * Send a request with only attributes first. If server can return all
+ * of the requested attributes, then send request with object ids
+ */
+ send_object_info_request(data->fd[1], &args);
+ if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
+ check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+ return -1;
+ }
+ string_list_split(&server_attributes, reader.line, ' ', -1);
+ packet_reader_read(&reader);
+ check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+ if (server_attributes.nr != args.object_info_options->nr)
+ return -1;
+ for (i = 0; i < server_attributes.nr; i++) {
+ if (!strcmp(server_attributes.items[i].string, "size"))
+ size_index = i + 1;
+ }
+ args.oids = transport->smart_options->object_info_oids;
+ send_object_info_request(data->fd[1], &args);
+ break;
+ case protocol_v1:
+ case protocol_v0:
+ die(_("wrong protocol version. expected v2"));
+ case protocol_unknown_version:
+ BUG("unknown protocol version");
+ }
+
+ if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
+ die(_("error reading object info header"));
+ i = 0;
+ while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
+ struct string_list object_info_values = STRING_LIST_INIT_DUP;
+
+ string_list_split(&object_info_values, reader.line, ' ', -1);
+ if (size_index > 0) {
+ if (!strcmp(object_info_values.items[size_index].string, ""))
+ die("object-info: not our ref %s",
+ object_info_values.items[0].string);
+ *(*object_info_data)[i].sizep = strtoul(object_info_values.items[size_index].string, NULL, 10);
+ }
+ i++;
+ }
+ check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+
+ return 0;
+}
+
static struct ref *get_refs_via_connect(struct transport *transport, int for_push,
struct transport_ls_refs_options *options)
{
@@ -392,8 +465,14 @@ static int fetch_refs_via_pack(struct transport *transport,
args.server_options = transport->server_options;
args.negotiation_tips = data->options.negotiation_tips;
args.reject_shallow_remote = transport->smart_options->reject_shallow;
+ args.object_info = transport->smart_options->object_info;
- if (!data->got_remote_heads) {
+ if (transport->smart_options && transport->smart_options->object_info) {
+ if (!fetch_object_info(transport, data->options.object_info_data))
+ goto cleanup;
+ ret = -1;
+ goto cleanup;
+ } else if (!data->got_remote_heads) {
int i;
int must_list_refs = 0;
for (i = 0; i < nr_heads; i++) {
diff --git a/transport.h b/transport.h
index 12bc08fc33..dbf60bb710 100644
--- a/transport.h
+++ b/transport.h
@@ -6,6 +6,7 @@
#include "remote.h"
#include "list-objects-filter-options.h"
#include "string-list.h"
+#include "object-store.h"
struct git_transport_options {
unsigned thin : 1;
@@ -31,6 +32,12 @@ struct git_transport_options {
*/
unsigned connectivity_checked:1;
+ /*
+ * Transport will attempt to pull only object-info. Fallbacks
+ * to pulling entire object if object-info is not supported
+ */
+ unsigned object_info : 1;
+
int depth;
const char *deepen_since;
const struct string_list *deepen_not;
@@ -54,6 +61,10 @@ struct git_transport_options {
* common commits to this oidset instead of fetching any packfiles.
*/
struct oidset *acked_commits;
+
+ struct oid_array *object_info_oids;
+ struct object_info **object_info_data;
+ struct string_list *object_info_options;
};
enum transport_family {
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 5/8] transport: add client side capability to request object-info
2022-05-02 17:09 ` [PATCH v4 5/8] transport: add client side capability to request object-info Calvin Wan
@ 2022-05-03 0:54 ` Junio C Hamano
2022-05-03 18:58 ` Calvin Wan
2022-05-03 23:15 ` Jonathan Tan
1 sibling, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-05-03 0:54 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> The question of version mismatch often comes up with client/server
> relationships. There are two cases to consider here (assuming either
> client or server functionality for object-info changes between the
> different versions):
>
> 1. client version > server version
> - client can request additional attributes from server
> 2. server version > client version
> - server can return additional info to the client
>
> The second case is a non-issue since the client would never be able to
> request additional info from the server.
That assumes that we can never remove attributes once we start
supporting them, so it is not a non-issue exactly. The way we make
it a non-issue is to make sure that the server returns no more than
what the client has asked, i.e. the same way we deal with the first
case.
> In order to solve the first
> case, recall an earlier patch where the server sends back the attributes
> even if no object ids are sent by the client. This allows the client to
> first check whether the server can accept the requested attributes
> before sending the entire request.
It requires two round-trips when the server does not know something
the client has asked, one without any object names, and then a real
request with the list of object names?
I think in practice, because the way the previous step is
structured, you can send a full request and in the non-error
codepath the server side will understand everything and give full
response back without an extra round-trip. Only in the error case,
the client will see the set of acceptable attributes and no object
data (and implicitly treat that as an indication of an error), but
then what would the client do after that happens?
- The client will just give up; the attributes it wanted but the
server couldn't supply are so fundamentally necessary to perform
whatever the client wanted to do.
- The client will retry the request by asking a known subset of
attributes for the same set of objects. This requires an extra
round-trip. If an earlier step didn't make .unknown case bypass
the response loop, this extra round-trip would have been totally
unnecessary.
> index 8c7752fc82..c27018d48c 100644
> --- a/fetch-pack.h
> +++ b/fetch-pack.h
> @@ -69,6 +69,12 @@ struct fetch_pack_args {
> unsigned connectivity_checked:1;
> };
>
> +struct object_info_args {
> + const struct string_list *object_info_options;
> + const struct string_list *server_options;
> + const struct oid_array *oids;
> +};
These are const pointers because these lists are not owned by this
structure and are borrowed from somebody else?
> diff --git a/transport-helper.c b/transport-helper.c
> index b4dbbabb0c..093946f7fd 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -686,13 +686,16 @@ static int fetch_refs(struct transport *transport,
>
> /*
> * If we reach here, then the server, the client, and/or the transport
> - * helper does not support protocol v2. --negotiate-only requires
> - * protocol v2.
> + * helper does not support protocol v2. --negotiate-only and --object-info
> + * require protocol v2.
> */
OK.
> if (data->transport_options.acked_commits) {
> warning(_("--negotiate-only requires protocol v2"));
> return -1;
> }
> + if (transport->smart_options->object_info) {
> + die(_("--object-info requires protocol v2"));
> + }
OK.
> diff --git a/transport.c b/transport.c
> index 3d64a43ab3..08c505e1d0 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -353,6 +353,79 @@ static struct ref *handshake(struct transport *transport, int for_push,
> return refs;
> }
>
> +static int fetch_object_info(struct transport *transport, struct object_info **object_info_data)
> +{
> + size_t i;
> + int size_index = -1;
> + struct git_transport_data *data = transport->data;
> + struct object_info_args args;
> + struct packet_reader reader;
> + struct string_list server_attributes = STRING_LIST_INIT_DUP;
> +
> + memset(&args, 0, sizeof(args));
> + args.server_options = transport->server_options;
> + args.object_info_options = transport->smart_options->object_info_options;
It is unclear who fills smart_options->object_info_options and from
what. Are we referring to something that is not yet used, or worse
yet, code that is not yet written?
It seems that the result of applying 1-5/8 does not compile.
> + connect_setup(transport, 0);
> + packet_reader_init(&reader, data->fd[0], NULL, 0,
> + PACKET_READ_CHOMP_NEWLINE |
> + PACKET_READ_DIE_ON_ERR_PACKET);
> + data->version = discover_version(&reader);
> +
> + transport->hash_algo = reader.hash_algo;
> +
> + switch (data->version) {
> + case protocol_v2:
> + if (!server_supports_v2("object-info", 0))
> + return -1;
> + /**
> + * Send a request with only attributes first. If server can return all
> + * of the requested attributes, then send request with object ids
> + */
Doesn't it force the most pessimistic behaviour, i.e. always do a
likely-to-be-useless (when the feature becomes widely avaiable)
probe round-trip before making a real request?
> + send_object_info_request(data->fd[1], &args);
> + if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
> + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> + return -1;
> + }
All these check_stateless_delimiter() calls in this patch are on
overly long lines.
> + string_list_split(&server_attributes, reader.line, ' ', -1);
> + packet_reader_read(&reader);
> + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> + if (server_attributes.nr != args.object_info_options->nr)
> + return -1;
So, I am guessing (without having the code that prepares the right
hand side) that we are comparing the number of attributes we asked
(on the right hand side) and the number of attributes the other side
said it supports in response to our request (on the left hand
side).
This does not protect against a server that responds with attribute
names duplicated or any somewhat odd third-party servers that are
not identical to what you wrote here. I would rather see us to loop
through object_info_options array and see if server_attributes list
says all of them are supported more explicitly. That can be done in
the loop below, where you'd figure out the "foo_index" for each
attribute "foo" you are supporting in the code.
> + for (i = 0; i < server_attributes.nr; i++) {
> + if (!strcmp(server_attributes.items[i].string, "size"))
> + size_index = i + 1;
> + }
size_index was initialized to -1 and I was expecting we can tell the
attribute is not used by checking (size_index < 0), but this seems
to make size_index 1 based, not 0 based. Intended?
I think this is to skip the object name that comes as the first item
in the response, but it would be more "pure" to keep foo_index
0-based and add the offset by whatever constant number of things
that come in front (currently, 1 for object name) near a lot closer
to where we parse and read the data, i.e. in the while() loop below.
> + args.oids = transport->smart_options->object_info_oids;
> + send_object_info_request(data->fd[1], &args);
And this is the second round-trip.
> + break;
> + case protocol_v1:
> + case protocol_v0:
> + die(_("wrong protocol version. expected v2"));
> + case protocol_unknown_version:
> + BUG("unknown protocol version");
> + }
> +
> + if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
> + die(_("error reading object info header"));
> + i = 0;
> + while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
> +
> + string_list_split(&object_info_values, reader.line, ' ', -1);
> + if (size_index > 0) {
> + if (!strcmp(object_info_values.items[size_index].string, ""))
> + die("object-info: not our ref %s",
> + object_info_values.items[0].string);
> + *(*object_info_data)[i].sizep = strtoul(object_info_values.items[size_index].string, NULL, 10);
> + }
> + i++;
> + }
> + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> +
> + return 0;
> +}
All of the above look written for the best case scenario (i.e. it
assumes there won't be any unexpected type of response), which makes
a good fuzz target. A malicious server side can throw non-number in
the "size" slot and strtoul() does not check for a malformatted
number, or it can return more response records than it was
requested, and overflow object_info_data[] array the caller of this
function prepared.
smart_options->object_info_oids may know how many response records
we should expect, so at least this while() loop should be able to
protect against an object_info_data[] overflow attack using it as
the upper limit of i.
> static struct ref *get_refs_via_connect(struct transport *transport, int for_push,
> struct transport_ls_refs_options *options)
> {
> @@ -392,8 +465,14 @@ static int fetch_refs_via_pack(struct transport *transport,
> args.server_options = transport->server_options;
> args.negotiation_tips = data->options.negotiation_tips;
> args.reject_shallow_remote = transport->smart_options->reject_shallow;
> + args.object_info = transport->smart_options->object_info;
>
> - if (!data->got_remote_heads) {
> + if (transport->smart_options && transport->smart_options->object_info) {
> + if (!fetch_object_info(transport, data->options.object_info_data))
> + goto cleanup;
> + ret = -1;
> + goto cleanup;
> + } else if (!data->got_remote_heads) {
> int i;
> int must_list_refs = 0;
> for (i = 0; i < nr_heads; i++) {
> diff --git a/transport.h b/transport.h
> index 12bc08fc33..dbf60bb710 100644
> --- a/transport.h
> +++ b/transport.h
> @@ -6,6 +6,7 @@
> #include "remote.h"
> #include "list-objects-filter-options.h"
> #include "string-list.h"
> +#include "object-store.h"
>
> struct git_transport_options {
> unsigned thin : 1;
> @@ -31,6 +32,12 @@ struct git_transport_options {
> */
> unsigned connectivity_checked:1;
>
> + /*
> + * Transport will attempt to pull only object-info. Fallbacks
> + * to pulling entire object if object-info is not supported
> + */
> + unsigned object_info : 1;
> +
> int depth;
> const char *deepen_since;
> const struct string_list *deepen_not;
> @@ -54,6 +61,10 @@ struct git_transport_options {
> * common commits to this oidset instead of fetching any packfiles.
> */
> struct oidset *acked_commits;
> +
> + struct oid_array *object_info_oids;
> + struct object_info **object_info_data;
> + struct string_list *object_info_options;
> };
>
> enum transport_family {
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 5/8] transport: add client side capability to request object-info
2022-05-03 0:54 ` Junio C Hamano
@ 2022-05-03 18:58 ` Calvin Wan
2022-05-04 15:42 ` Junio C Hamano
0 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-05-03 18:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86, me
> It requires two round-trips when the server does not know something
> the client has asked, one without any object names, and then a real
> request with the list of object names?
> I think in practice, because the way the previous step is
> structured, you can send a full request and in the non-error
> codepath the server side will understand everything and give full
> response back without an extra round-trip. Only in the error case,
> the client will see the set of acceptable attributes and no object
> data (and implicitly treat that as an indication of an error), but
> then what would the client do after that happens?
> - The client will just give up; the attributes it wanted but the
> server couldn't supply are so fundamentally necessary to perform
> whatever the client wanted to do.
> - The client will retry the request by asking a known subset of
> attributes for the same set of objects. This requires an extra
> round-trip. If an earlier step didn't make .unknown case bypass
> the response loop, this extra round-trip would have been totally
> unnecessary.
I do agree that the two round trips are unnecessary and I can send the
full request to begin with. The client (after patch 6) will fallback to
fetch after this initial request fails.
> These are const pointers because these lists are not owned by this
> structure and are borrowed from somebody else?
Yes git_transport_options holds them.
> It is unclear who fills smart_options->object_info_options and from
> what. Are we referring to something that is not yet used, or worse
> yet, code that is not yet written?
The code that calls this function is in patch 8. I decided to separate
out the transport side and the new command in cat-file --batch-command
to make it easier to explain to a reviewer how the individual parts
work. Looking holistically at your comments, should I have gone about
splitting the patches differently? It seems like I was a little too
aggressive this time, creating confusion by requiring the reviewer to
look at the next patch to answer some questions.
> It seems that the result of applying 1-5/8 does not compile.
I'll check why this is the case. I should get in the habit of compiling
all of my patches individually.
> size_index was initialized to -1 and I was expecting we can tell the
> attribute is not used by checking (size_index < 0), but this seems
> to make size_index 1 based, not 0 based. Intended?
Yes the server returns the packet as "object_id SP size" so the first
value is always the object_id.
> All of the above look written for the best case scenario (i.e. it
> assumes there won't be any unexpected type of response), which makes
> a good fuzz target.
I didn't take into account the possibility of a malicious server in my
implementation. Will go back through and protect against it.
On Mon, May 2, 2022 at 5:54 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Calvin Wan <calvinwan@google.com> writes:
>
> > The question of version mismatch often comes up with client/server
> > relationships. There are two cases to consider here (assuming either
> > client or server functionality for object-info changes between the
> > different versions):
> >
> > 1. client version > server version
> > - client can request additional attributes from server
> > 2. server version > client version
> > - server can return additional info to the client
> >
> > The second case is a non-issue since the client would never be able to
> > request additional info from the server.
>
> That assumes that we can never remove attributes once we start
> supporting them, so it is not a non-issue exactly. The way we make
> it a non-issue is to make sure that the server returns no more than
> what the client has asked, i.e. the same way we deal with the first
> case.
>
> > In order to solve the first
> > case, recall an earlier patch where the server sends back the attributes
> > even if no object ids are sent by the client. This allows the client to
> > first check whether the server can accept the requested attributes
> > before sending the entire request.
>
> It requires two round-trips when the server does not know something
> the client has asked, one without any object names, and then a real
> request with the list of object names?
>
> I think in practice, because the way the previous step is
> structured, you can send a full request and in the non-error
> codepath the server side will understand everything and give full
> response back without an extra round-trip. Only in the error case,
> the client will see the set of acceptable attributes and no object
> data (and implicitly treat that as an indication of an error), but
> then what would the client do after that happens?
>
> - The client will just give up; the attributes it wanted but the
> server couldn't supply are so fundamentally necessary to perform
> whatever the client wanted to do.
>
> - The client will retry the request by asking a known subset of
> attributes for the same set of objects. This requires an extra
> round-trip. If an earlier step didn't make .unknown case bypass
> the response loop, this extra round-trip would have been totally
> unnecessary.
>
> > index 8c7752fc82..c27018d48c 100644
> > --- a/fetch-pack.h
> > +++ b/fetch-pack.h
> > @@ -69,6 +69,12 @@ struct fetch_pack_args {
> > unsigned connectivity_checked:1;
> > };
> >
> > +struct object_info_args {
> > + const struct string_list *object_info_options;
> > + const struct string_list *server_options;
> > + const struct oid_array *oids;
> > +};
>
> These are const pointers because these lists are not owned by this
> structure and are borrowed from somebody else?
>
> > diff --git a/transport-helper.c b/transport-helper.c
> > index b4dbbabb0c..093946f7fd 100644
> > --- a/transport-helper.c
> > +++ b/transport-helper.c
> > @@ -686,13 +686,16 @@ static int fetch_refs(struct transport *transport,
> >
> > /*
> > * If we reach here, then the server, the client, and/or the transport
> > - * helper does not support protocol v2. --negotiate-only requires
> > - * protocol v2.
> > + * helper does not support protocol v2. --negotiate-only and --object-info
> > + * require protocol v2.
> > */
>
> OK.
>
> > if (data->transport_options.acked_commits) {
> > warning(_("--negotiate-only requires protocol v2"));
> > return -1;
> > }
> > + if (transport->smart_options->object_info) {
> > + die(_("--object-info requires protocol v2"));
> > + }
>
> OK.
>
> > diff --git a/transport.c b/transport.c
> > index 3d64a43ab3..08c505e1d0 100644
> > --- a/transport.c
> > +++ b/transport.c
> > @@ -353,6 +353,79 @@ static struct ref *handshake(struct transport *transport, int for_push,
> > return refs;
> > }
> >
> > +static int fetch_object_info(struct transport *transport, struct object_info **object_info_data)
> > +{
> > + size_t i;
> > + int size_index = -1;
> > + struct git_transport_data *data = transport->data;
> > + struct object_info_args args;
> > + struct packet_reader reader;
> > + struct string_list server_attributes = STRING_LIST_INIT_DUP;
> > +
> > + memset(&args, 0, sizeof(args));
> > + args.server_options = transport->server_options;
> > + args.object_info_options = transport->smart_options->object_info_options;
>
> It is unclear who fills smart_options->object_info_options and from
> what. Are we referring to something that is not yet used, or worse
> yet, code that is not yet written?
>
> It seems that the result of applying 1-5/8 does not compile.
>
> > + connect_setup(transport, 0);
> > + packet_reader_init(&reader, data->fd[0], NULL, 0,
> > + PACKET_READ_CHOMP_NEWLINE |
> > + PACKET_READ_DIE_ON_ERR_PACKET);
> > + data->version = discover_version(&reader);
> > +
> > + transport->hash_algo = reader.hash_algo;
> > +
> > + switch (data->version) {
> > + case protocol_v2:
> > + if (!server_supports_v2("object-info", 0))
> > + return -1;
> > + /**
> > + * Send a request with only attributes first. If server can return all
> > + * of the requested attributes, then send request with object ids
> > + */
>
> Doesn't it force the most pessimistic behaviour, i.e. always do a
> likely-to-be-useless (when the feature becomes widely avaiable)
> probe round-trip before making a real request?
>
> > + send_object_info_request(data->fd[1], &args);
> > + if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
> > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> > + return -1;
> > + }
>
> All these check_stateless_delimiter() calls in this patch are on
> overly long lines.
>
> > + string_list_split(&server_attributes, reader.line, ' ', -1);
> > + packet_reader_read(&reader);
> > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> > + if (server_attributes.nr != args.object_info_options->nr)
> > + return -1;
>
> So, I am guessing (without having the code that prepares the right
> hand side) that we are comparing the number of attributes we asked
> (on the right hand side) and the number of attributes the other side
> said it supports in response to our request (on the left hand
> side).
>
> This does not protect against a server that responds with attribute
> names duplicated or any somewhat odd third-party servers that are
> not identical to what you wrote here. I would rather see us to loop
> through object_info_options array and see if server_attributes list
> says all of them are supported more explicitly. That can be done in
> the loop below, where you'd figure out the "foo_index" for each
> attribute "foo" you are supporting in the code.
>
> > + for (i = 0; i < server_attributes.nr; i++) {
> > + if (!strcmp(server_attributes.items[i].string, "size"))
> > + size_index = i + 1;
> > + }
>
> size_index was initialized to -1 and I was expecting we can tell the
> attribute is not used by checking (size_index < 0), but this seems
> to make size_index 1 based, not 0 based. Intended?
>
> I think this is to skip the object name that comes as the first item
> in the response, but it would be more "pure" to keep foo_index
> 0-based and add the offset by whatever constant number of things
> that come in front (currently, 1 for object name) near a lot closer
> to where we parse and read the data, i.e. in the while() loop below.
>
> > + args.oids = transport->smart_options->object_info_oids;
> > + send_object_info_request(data->fd[1], &args);
>
> And this is the second round-trip.
>
> > + break;
> > + case protocol_v1:
> > + case protocol_v0:
> > + die(_("wrong protocol version. expected v2"));
> > + case protocol_unknown_version:
> > + BUG("unknown protocol version");
> > + }
> > +
> > + if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
> > + die(_("error reading object info header"));
> > + i = 0;
> > + while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
> > + struct string_list object_info_values = STRING_LIST_INIT_DUP;
> > +
> > + string_list_split(&object_info_values, reader.line, ' ', -1);
> > + if (size_index > 0) {
> > + if (!strcmp(object_info_values.items[size_index].string, ""))
> > + die("object-info: not our ref %s",
> > + object_info_values.items[0].string);
> > + *(*object_info_data)[i].sizep = strtoul(object_info_values.items[size_index].string, NULL, 10);
> > + }
> > + i++;
> > + }
> > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> > +
> > + return 0;
> > +}
>
> All of the above look written for the best case scenario (i.e. it
> assumes there won't be any unexpected type of response), which makes
> a good fuzz target. A malicious server side can throw non-number in
> the "size" slot and strtoul() does not check for a malformatted
> number, or it can return more response records than it was
> requested, and overflow object_info_data[] array the caller of this
> function prepared.
>
> smart_options->object_info_oids may know how many response records
> we should expect, so at least this while() loop should be able to
> protect against an object_info_data[] overflow attack using it as
> the upper limit of i.
>
> > static struct ref *get_refs_via_connect(struct transport *transport, int for_push,
> > struct transport_ls_refs_options *options)
> > {
> > @@ -392,8 +465,14 @@ static int fetch_refs_via_pack(struct transport *transport,
> > args.server_options = transport->server_options;
> > args.negotiation_tips = data->options.negotiation_tips;
> > args.reject_shallow_remote = transport->smart_options->reject_shallow;
> > + args.object_info = transport->smart_options->object_info;
> >
> > - if (!data->got_remote_heads) {
> > + if (transport->smart_options && transport->smart_options->object_info) {
> > + if (!fetch_object_info(transport, data->options.object_info_data))
> > + goto cleanup;
> > + ret = -1;
> > + goto cleanup;
> > + } else if (!data->got_remote_heads) {
> > int i;
> > int must_list_refs = 0;
> > for (i = 0; i < nr_heads; i++) {
> > diff --git a/transport.h b/transport.h
> > index 12bc08fc33..dbf60bb710 100644
> > --- a/transport.h
> > +++ b/transport.h
> > @@ -6,6 +6,7 @@
> > #include "remote.h"
> > #include "list-objects-filter-options.h"
> > #include "string-list.h"
> > +#include "object-store.h"
> >
> > struct git_transport_options {
> > unsigned thin : 1;
> > @@ -31,6 +32,12 @@ struct git_transport_options {
> > */
> > unsigned connectivity_checked:1;
> >
> > + /*
> > + * Transport will attempt to pull only object-info. Fallbacks
> > + * to pulling entire object if object-info is not supported
> > + */
> > + unsigned object_info : 1;
> > +
> > int depth;
> > const char *deepen_since;
> > const struct string_list *deepen_not;
> > @@ -54,6 +61,10 @@ struct git_transport_options {
> > * common commits to this oidset instead of fetching any packfiles.
> > */
> > struct oidset *acked_commits;
> > +
> > + struct oid_array *object_info_oids;
> > + struct object_info **object_info_data;
> > + struct string_list *object_info_options;
> > };
> >
> > enum transport_family {
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 5/8] transport: add client side capability to request object-info
2022-05-03 18:58 ` Calvin Wan
@ 2022-05-04 15:42 ` Junio C Hamano
0 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-05-04 15:42 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
>> It seems that the result of applying 1-5/8 does not compile.
>
> I'll check why this is the case. I should get in the habit of compiling
> all of my patches individually.
For a 8-patch series, we should make sure that all 8 states resuting
from applying patches 1-thru-N for each N (1 <= N <= 8) builds and
tests OK for bisectability.
I often do not check that due to lack of time on the receiving end,
but for this series, I didn't understand the sudden appearance of
the object_info_options member in the code that didn't even seem to
be populated anywhere, and I noticed that the series was organized
incorrectly. Perhaps a simple rebase misordering?
>> size_index was initialized to -1 and I was expecting we can tell the
>> attribute is not used by checking (size_index < 0), but this seems
>> to make size_index 1 based, not 0 based. Intended?
>
> Yes the server returns the packet as "object_id SP size" so the first
> value is always the object_id.
I think this is to skip the object name that comes as the first item
in the response, but it would be more "pure" to keep foo_index
0-based and add the offset by whatever constant number of things
that come in front (currently, 1 for object name) near a lot closer
to where we parse and read the data, i.e. in the while() loop below.
IOW, the code that sets size_index based on the attribute query
response should not have to know how many fixed elements will come
before these attributes on the response lines. That knowledge
belongs to the code below:
> + i = 0;
> + while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
> +
> + string_list_split(&object_info_values, reader.line, ' ', -1);
> + if (size_index > 0) {
> + if (!strcmp(object_info_values.items[size_index].string, ""))
And here the code that uses size_index should be like
if (0 <= size_index &&
!strcmp(object_info_values.items[1 + size_index].string, ""))
...
if we wanted the logic to be more "pure" and keep foo_index 0-based.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 5/8] transport: add client side capability to request object-info
2022-05-02 17:09 ` [PATCH v4 5/8] transport: add client side capability to request object-info Calvin Wan
2022-05-03 0:54 ` Junio C Hamano
@ 2022-05-03 23:15 ` Jonathan Tan
2022-05-04 15:50 ` Junio C Hamano
1 sibling, 1 reply; 103+ messages in thread
From: Jonathan Tan @ 2022-05-03 23:15 UTC (permalink / raw)
To: Calvin Wan; +Cc: Jonathan Tan, git, gitster, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> Sometimes it is useful to get information about an object without having
> to download it completely. The server logic has already been implemented
> as “a2ba162cda (object-info: support for retrieving object info,
> 2021-04-20)”. This patch implements the client option for it. Currently,
> only 'size' is supported and the server must be v2, however future
> patches can implement additional options.
>
> The question of version mismatch often comes up with client/server
> relationships. There are two cases to consider here (assuming either
> client or server functionality for object-info changes between the
> different versions):
>
> 1. client version > server version
> - client can request additional attributes from server
> 2. server version > client version
> - server can return additional info to the client
>
> The second case is a non-issue since the client would never be able to
> request additional info from the server. In order to solve the first
> case, recall an earlier patch where the server sends back the attributes
> even if no object ids are sent by the client. This allows the client to
> first check whether the server can accept the requested attributes
> before sending the entire request.
From this description, it seems like the intention is to send an
object-info request, and then if the server responds in a certain way
(here, sending back the attributes - presumably without sending any
actual information), then we know that the server doesn't support our
request and we need to fall back. As Junio says [1], this requires one
RTT more than necessary. We could just determine if the server supports
the attributes we want by using capabilities (without needing to send
the request to check).
[1] https://lore.kernel.org/git/xmqqilqnsaep.fsf@gitster.g/
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 5/8] transport: add client side capability to request object-info
2022-05-03 23:15 ` Jonathan Tan
@ 2022-05-04 15:50 ` Junio C Hamano
0 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-05-04 15:50 UTC (permalink / raw)
To: Jonathan Tan; +Cc: Calvin Wan, git, philipoakley, johncai86, me
Jonathan Tan <jonathantanmy@google.com> writes:
> RTT more than necessary. We could just determine if the server supports
> the attributes we want by using capabilities (without needing to send
> the request to check).
Hmph. capabilities may cut in both ways, though.
Those clients that are not interested in object-info at all (which
are the majority of case where people clone, fetch or ls-remote),
they do not even need to know what kind of object attributes the
object-info command is prepared to report, and they would appreciate
not having to spend any extra byte in the capability-advertisement.
Of course, for object-info clients, having it upfront does make
things simpler.
So...
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v4 6/8] transport: add object-info fallback to fetch
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (4 preceding siblings ...)
2022-05-02 17:09 ` [PATCH v4 5/8] transport: add client side capability to request object-info Calvin Wan
@ 2022-05-02 17:09 ` Calvin Wan
2022-05-03 23:27 ` Jonathan Tan
2022-05-02 17:09 ` [PATCH v4 7/8] cat-file: move parse_cmd and DEFAULT_FORMAT up Calvin Wan
` (10 subsequent siblings)
16 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:09 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
If a v2 server does not support object-info or if the client requests
information that isn't supported by object-info, fetch the objects as if
it were a standard v2 fetch (however without changing any refs).
---
fetch-pack.c | 14 +++++++++++++-
fetch-pack.h | 2 ++
transport.c | 18 +++++++++++++++---
3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 506ca28e05..938d082b80 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1639,6 +1639,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
if (args->depth > 0 || args->deepen_since || args->deepen_not)
args->deepen = 1;
+ if (args->object_info)
+ state = FETCH_SEND_REQUEST;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
@@ -1648,7 +1651,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
- if (!args->refetch && everything_local(args, &ref))
+ if (!args->refetch && !args->object_info && everything_local(args, &ref))
state = FETCH_DONE;
else
state = FETCH_SEND_REQUEST;
@@ -2035,6 +2038,15 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
args->connectivity_checked = 1;
}
+ if (args->object_info) {
+ struct ref *ref_cpy_reader = ref_cpy;
+ int i = 0;
+ while (ref_cpy_reader) {
+ oid_object_info_extended(the_repository, &ref_cpy_reader->old_oid, &(*args->object_info_data)[i], OBJECT_INFO_LOOKUP_REPLACE);
+ ref_cpy_reader = ref_cpy_reader->next;
+ i++;
+ }
+ }
update_shallow(args, sought, nr_sought, &si);
cleanup:
clear_shallow_info(&si);
diff --git a/fetch-pack.h b/fetch-pack.h
index c27018d48c..552fd7bde0 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -17,6 +17,7 @@ struct fetch_pack_args {
const struct string_list *deepen_not;
struct list_objects_filter_options filter_options;
const struct string_list *server_options;
+ struct object_info **object_info_data;
/*
* If not NULL, during packfile negotiation, fetch-pack will send "have"
@@ -43,6 +44,7 @@ struct fetch_pack_args {
unsigned reject_shallow_remote:1;
unsigned deepen:1;
unsigned refetch:1;
+ unsigned object_info:1;
/*
* Indicate that the remote of this request is a promisor remote. The
diff --git a/transport.c b/transport.c
index 08c505e1d0..b85e52d9a8 100644
--- a/transport.c
+++ b/transport.c
@@ -436,10 +436,12 @@ static int fetch_refs_via_pack(struct transport *transport,
int nr_heads, struct ref **to_fetch)
{
int ret = 0;
+ size_t i;
struct git_transport_data *data = transport->data;
struct ref *refs = NULL;
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
+ struct ref *wanted_refs = xcalloc(1, sizeof (struct ref));
memset(&args, 0, sizeof(args));
args.uploadpack = data->options.uploadpack;
@@ -468,10 +470,19 @@ static int fetch_refs_via_pack(struct transport *transport,
args.object_info = transport->smart_options->object_info;
if (transport->smart_options && transport->smart_options->object_info) {
+ struct ref *ref = wanted_refs;
+
if (!fetch_object_info(transport, data->options.object_info_data))
goto cleanup;
- ret = -1;
- goto cleanup;
+ args.object_info_data = data->options.object_info_data;
+ for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
+ struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
+ temp_ref->old_oid = *(transport->smart_options->object_info_oids->oid + i);
+ temp_ref->exact_oid = 1;
+ ref->next = temp_ref;
+ ref = ref->next;
+ }
+ transport->remote_refs = wanted_refs->next;
} else if (!data->got_remote_heads) {
int i;
int must_list_refs = 0;
@@ -489,7 +500,7 @@ static int fetch_refs_via_pack(struct transport *transport,
else if (data->version <= protocol_v1)
die_if_server_options(transport);
- if (data->options.acked_commits) {
+ if (data->options.acked_commits && !transport->smart_options->object_info) {
if (data->version < protocol_v2) {
warning(_("--negotiate-only requires protocol v2"));
ret = -1;
@@ -532,6 +543,7 @@ static int fetch_refs_via_pack(struct transport *transport,
free_refs(refs_tmp);
free_refs(refs);
+ free_refs(wanted_refs);
return ret;
}
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 6/8] transport: add object-info fallback to fetch
2022-05-02 17:09 ` [PATCH v4 6/8] transport: add object-info fallback to fetch Calvin Wan
@ 2022-05-03 23:27 ` Jonathan Tan
0 siblings, 0 replies; 103+ messages in thread
From: Jonathan Tan @ 2022-05-03 23:27 UTC (permalink / raw)
To: Calvin Wan; +Cc: Jonathan Tan, git, gitster, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> If a v2 server does not support object-info or if the client requests
> information that isn't supported by object-info, fetch the objects as if
> it were a standard v2 fetch (however without changing any refs).
What do you mean by "however without changing any refs"? (I would have
expected that no refs would be changed, so this, to me, implies that we
would expect some refs to be changed.)
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 506ca28e05..938d082b80 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1639,6 +1639,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> if (args->depth > 0 || args->deepen_since || args->deepen_not)
> args->deepen = 1;
>
> + if (args->object_info)
> + state = FETCH_SEND_REQUEST;
> +
> while (state != FETCH_DONE) {
> switch (state) {
> case FETCH_CHECK_LOCAL:
> @@ -1648,7 +1651,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> /* Filter 'ref' by 'sought' and those that aren't local */
> mark_complete_and_common_ref(negotiator, args, &ref);
> filter_refs(args, &ref, sought, nr_sought);
> - if (!args->refetch && everything_local(args, &ref))
> + if (!args->refetch && !args->object_info && everything_local(args, &ref))
> state = FETCH_DONE;
> else
> state = FETCH_SEND_REQUEST;
I think that the purpose of all these changes is to skip certain steps
when we know that we're fetching as a fallback for object-info, but I
don't think they're necessary - it's fine to not fetch certain objects
if we have them present.
> @@ -2035,6 +2038,15 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
> args->connectivity_checked = 1;
> }
>
> + if (args->object_info) {
> + struct ref *ref_cpy_reader = ref_cpy;
> + int i = 0;
> + while (ref_cpy_reader) {
> + oid_object_info_extended(the_repository, &ref_cpy_reader->old_oid, &(*args->object_info_data)[i], OBJECT_INFO_LOOKUP_REPLACE);
> + ref_cpy_reader = ref_cpy_reader->next;
> + i++;
> + }
> + }
Could this part be done in the same function that falls back
(fetch_refs_via_pack(), I believe)? That would make the code easier to
understand - here, we don't know why we would need to copy the OIDs, but
in the function that falls back, we can look up and see that this is
done because we couldn't do something else.
> diff --git a/transport.c b/transport.c
> index 08c505e1d0..b85e52d9a8 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -436,10 +436,12 @@ static int fetch_refs_via_pack(struct transport *transport,
> int nr_heads, struct ref **to_fetch)
> {
> int ret = 0;
> + size_t i;
> struct git_transport_data *data = transport->data;
> struct ref *refs = NULL;
> struct fetch_pack_args args;
> struct ref *refs_tmp = NULL;
> + struct ref *wanted_refs = xcalloc(1, sizeof (struct ref));
If you only need these new variables in a block, declare them there (and
free them at the end of that block).
> @@ -489,7 +500,7 @@ static int fetch_refs_via_pack(struct transport *transport,
> else if (data->version <= protocol_v1)
> die_if_server_options(transport);
>
> - if (data->options.acked_commits) {
> + if (data->options.acked_commits && !transport->smart_options->object_info) {
> if (data->version < protocol_v2) {
> warning(_("--negotiate-only requires protocol v2"));
> ret = -1;
Is this addition necessary? --negotiate-only and object-info do not work
together.
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v4 7/8] cat-file: move parse_cmd and DEFAULT_FORMAT up
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (5 preceding siblings ...)
2022-05-02 17:09 ` [PATCH v4 6/8] transport: add object-info fallback to fetch Calvin Wan
@ 2022-05-02 17:09 ` Calvin Wan
2022-05-02 17:09 ` [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (9 subsequent siblings)
16 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:09 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
It would be useful for a future patch to use these definitions so moving
them higher.
---
builtin/cat-file.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..cfe74c36eb 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -34,6 +34,8 @@ struct batch_options {
const char *format;
};
+#define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
+
static const char *force_path;
static int filter_object(const char *path, unsigned mode,
@@ -556,6 +558,16 @@ static void parse_cmd_info(struct batch_options *opt,
batch_one_object(line, output, opt, data);
}
+static const struct parse_cmd {
+ const char *name;
+ parse_cmd_fn_t fn;
+ unsigned takes_args;
+} commands[] = {
+ { "contents", parse_cmd_contents, 1},
+ { "info", parse_cmd_info, 1},
+ { "flush", NULL, 0},
+};
+
static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
@@ -583,17 +595,6 @@ static void free_cmds(struct queued_cmd *cmd, size_t *nr)
*nr = 0;
}
-
-static const struct parse_cmd {
- const char *name;
- parse_cmd_fn_t fn;
- unsigned takes_args;
-} commands[] = {
- { "contents", parse_cmd_contents, 1},
- { "info", parse_cmd_info, 1},
- { "flush", NULL, 0},
-};
-
static void batch_objects_command(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data)
@@ -659,8 +660,6 @@ static void batch_objects_command(struct batch_options *opt,
strbuf_release(&input);
}
-#define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
-
static int batch_objects(struct batch_options *opt)
{
struct strbuf input = STRBUF_INIT;
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (6 preceding siblings ...)
2022-05-02 17:09 ` [PATCH v4 7/8] cat-file: move parse_cmd and DEFAULT_FORMAT up Calvin Wan
@ 2022-05-02 17:09 ` Calvin Wan
2022-05-04 21:27 ` Jonathan Tan
2022-07-31 15:24 ` ZheNing Hu
2022-07-28 23:02 ` [PATCH v5 0/6] " Calvin Wan
` (8 subsequent siblings)
16 siblings, 2 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-02 17:09 UTC (permalink / raw)
To: git; +Cc: gitster, jonathantanmy, philipoakley, johncai86, calvinwan, me
Since the `info` command in cat-file --batch-command prints object info
for a given object, it is natural to add another command in cat-file
--batch-command to print object info for a given object from a remote.
Add `remote-object-info` to cat-file --batch-command.
While `info` takes object ids one at a time, this creates overhead when
making requests to a server so `remote-object-info` instead can take
multiple object ids at once.
cat-file --batch-command is generally implemented in the following
manner:
- Receive and parse input from user
- Call respective function attached to command
- Set batch mode state, get object info, print object info
In --buffer mode, this changes to:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue
- Call respective function attached to command
- Set batch mode state, get object info, print object info
Notice how the getting and printing of object info is accomplished one
at a time. As described above, this creates a problem for making
requests to a server. Therefore, `remote-object-info` is implemented in
the following manner:
- Receive and parse input from user
If command is `remote-object-info`:
- Get object info from remote
- Loop through object info
- Call respective function attached to `info`
- Set batch mode state, use passed in object info, print object
info
Else:
- Call respective function attached to command
- Parse input, get object info, print object info
And finally for --buffer mode `remote-object-info`:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue:
If command is `remote-object-info`:
- Get object info from remote
- Loop through object info
- Call respective function attached to `info`
- Set batch mode state, use passed in object info, print
object info
Else:
- Call respective function attached to command
- Set batch mode state, get object info, print object info
To summarize, `remote-object-info` gets object info from the remote and
then generates multiples `info` commands with the object info passed in.
It cannot be implemented similarly to `info` and `content` because of
the overhead with remote commenication.
---
Documentation/git-cat-file.txt | 16 +-
builtin/cat-file.c | 200 +++++++++++++++----
t/t1006-cat-file.sh | 347 +++++++++++++++++++++++++++++++++
3 files changed, 522 insertions(+), 41 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..0d9e8e6214 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -115,6 +115,10 @@ info <object>::
Print object info for object reference `<object>`. This corresponds to the
output of `--batch-check`.
+remote-object-info <remote> [<object>]::
+ Print object info for object references `<object>` at specified <remote>.
+ This command may only be combined with `--buffer`.
+
flush::
Used with `--buffer` to execute all preceding commands that were issued
since the beginning or since the last flush was issued. When `--buffer`
@@ -245,7 +249,8 @@ newline. The available atoms are:
The full hex representation of the object name.
`objecttype`::
- The type of the object (the same as `cat-file -t` reports).
+ The type of the object (the same as `cat-file -t` reports). See
+ `CAVEATS` below.
`objectsize`::
The size, in bytes, of the object (the same as `cat-file -s`
@@ -253,13 +258,14 @@ newline. The available atoms are:
`objectsize:disk`::
The size, in bytes, that the object takes up on disk. See the
- note about on-disk sizes in the `CAVEATS` section below.
+ note about on-disk sizes in the `CAVEATS` section below. Not
+ supported by `remote-object-info`.
`deltabase`::
If the object is stored as a delta on-disk, this expands to the
full hex representation of the delta base object name.
Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
- below.
+ below. Not supported by `remote-object-info`.
`rest`::
If this atom is used in the output string, input lines are split
@@ -346,6 +352,10 @@ directory name.
CAVEATS
-------
+Note that since object type is currently not supported by the
+object-info command request, git fetches the entire object instead to
+get the object info.
+
Note that the sizes of objects on disk are reported accurately, but care
should be taken in drawing conclusions about which refs or objects are
responsible for disk usage. The size of a packed non-delta object may be
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index cfe74c36eb..7baa9bed61 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,9 @@
#include "packfile.h"
#include "object-store.h"
#include "promisor-remote.h"
+#include "alias.h"
+#include "remote.h"
+#include "transport.h"
enum batch_mode {
BATCH_MODE_CONTENTS,
@@ -32,11 +35,14 @@ struct batch_options {
int unordered;
int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
const char *format;
+ int use_remote_info;
};
#define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
static const char *force_path;
+static struct object_info *remote_object_info;
+static struct oid_array object_info_oids = OID_ARRAY_INIT;
static int filter_object(const char *path, unsigned mode,
const struct object_id *oid,
@@ -425,48 +431,115 @@ static void batch_one_object(const char *obj_name,
int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
enum get_oid_result result;
- result = get_oid_with_context(the_repository, obj_name,
- flags, &data->oid, &ctx);
- if (result != FOUND) {
- switch (result) {
- case MISSING_OBJECT:
- printf("%s missing\n", obj_name);
- break;
- case SHORT_NAME_AMBIGUOUS:
- printf("%s ambiguous\n", obj_name);
- break;
- case DANGLING_SYMLINK:
- printf("dangling %"PRIuMAX"\n%s\n",
- (uintmax_t)strlen(obj_name), obj_name);
- break;
- case SYMLINK_LOOP:
- printf("loop %"PRIuMAX"\n%s\n",
- (uintmax_t)strlen(obj_name), obj_name);
- break;
- case NOT_DIR:
- printf("notdir %"PRIuMAX"\n%s\n",
- (uintmax_t)strlen(obj_name), obj_name);
- break;
- default:
- BUG("unknown get_sha1_with_context result %d\n",
- result);
- break;
+ if (!opt->use_remote_info) {
+ result = get_oid_with_context(the_repository, obj_name,
+ flags, &data->oid, &ctx);
+ if (result != FOUND) {
+ switch (result) {
+ case MISSING_OBJECT:
+ printf("%s missing\n", obj_name);
+ break;
+ case SHORT_NAME_AMBIGUOUS:
+ printf("%s ambiguous\n", obj_name);
+ break;
+ case DANGLING_SYMLINK:
+ printf("dangling %"PRIuMAX"\n%s\n",
+ (uintmax_t)strlen(obj_name), obj_name);
+ break;
+ case SYMLINK_LOOP:
+ printf("loop %"PRIuMAX"\n%s\n",
+ (uintmax_t)strlen(obj_name), obj_name);
+ break;
+ case NOT_DIR:
+ printf("notdir %"PRIuMAX"\n%s\n",
+ (uintmax_t)strlen(obj_name), obj_name);
+ break;
+ default:
+ BUG("unknown get_sha1_with_context result %d\n",
+ result);
+ break;
+ }
+ fflush(stdout);
+ return;
}
- fflush(stdout);
- return;
- }
- if (ctx.mode == 0) {
- printf("symlink %"PRIuMAX"\n%s\n",
- (uintmax_t)ctx.symlink_path.len,
- ctx.symlink_path.buf);
- fflush(stdout);
- return;
+ if (ctx.mode == 0) {
+ printf("symlink %"PRIuMAX"\n%s\n",
+ (uintmax_t)ctx.symlink_path.len,
+ ctx.symlink_path.buf);
+ fflush(stdout);
+ return;
+ }
}
batch_object_write(obj_name, scratch, opt, data, NULL, 0);
}
+static int get_remote_info(struct batch_options *opt, int argc, const char **argv)
+{
+ int retval = 0;
+ size_t i;
+ struct remote *remote = NULL;
+ struct object_id oid;
+ struct string_list object_info_options = STRING_LIST_INIT_NODUP;
+ static struct transport *gtransport;
+ char *format = DEFAULT_FORMAT;
+
+ if (opt->format)
+ format = xstrdup(opt->format);
+
+ remote = remote_get(argv[0]);
+ if (!remote)
+ die(_("must supply valid remote when using --object-info"));
+ oid_array_clear(&object_info_oids);
+ for (i = 1; i < argc; i++) {
+ if (get_oid(argv[i], &oid))
+ die(_("malformed object id '%s'"), argv[i]);
+ oid_array_append(&object_info_oids, &oid);
+ }
+
+ gtransport = transport_get(remote, NULL);
+ if (gtransport->smart_options) {
+ size_t j;
+ int include_size = 0, include_type = 0;
+
+ remote_object_info = xcalloc(object_info_oids.nr, sizeof(struct object_info));
+ gtransport->smart_options->object_info = 1;
+ gtransport->smart_options->object_info_oids = &object_info_oids;
+ /**
+ * 'size' is the only option currently supported.
+ * Other options that are passed in the format will default to a
+ * standard fetch request rather than object-info.
+ */
+ if (strstr(format, "%(objectsize)")) {
+ string_list_append(&object_info_options, "size");
+ include_size = 1;
+ }
+ if (strstr(format, "%(objecttype)")) {
+ string_list_append(&object_info_options, "type");
+ include_type = 1;
+ }
+ if (strstr(format, "%(objectsize:disk)"))
+ die(_("objectsize:disk is currently not supported with remote-object-info"));
+ if (strstr(format, "%(deltabase)"))
+ die(_("deltabase is currently not supported with remote-object-info"));
+ if (object_info_options.nr > 0) {
+ gtransport->smart_options->object_info_options = &object_info_options;
+ for (j = 0; j < object_info_oids.nr; j++) {
+ if (include_size)
+ remote_object_info[j].sizep = xcalloc(1, sizeof(long));
+ if (include_type)
+ remote_object_info[j].typep = xcalloc(1, sizeof(enum object_type));
+ }
+ gtransport->smart_options->object_info_data = &remote_object_info;
+ retval = transport_fetch_refs(gtransport, NULL);
+ }
+ } else {
+ retval = -1;
+ }
+ return retval;
+}
+
struct object_cb_data {
struct batch_options *opt;
struct expand_data *expand;
@@ -538,6 +611,7 @@ typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
struct queued_cmd {
parse_cmd_fn_t fn;
char *line;
+ const char *name;
};
static void parse_cmd_contents(struct batch_options *opt,
@@ -565,9 +639,49 @@ static const struct parse_cmd {
} commands[] = {
{ "contents", parse_cmd_contents, 1},
{ "info", parse_cmd_info, 1},
+ { "remote-object-info", parse_cmd_info, 1},
{ "flush", NULL, 0},
};
+static void parse_remote_info(struct batch_options *opt,
+ char *line,
+ struct strbuf *output,
+ struct expand_data *data,
+ const struct parse_cmd *p_cmd,
+ struct queued_cmd *q_cmd)
+{
+ int count;
+ size_t i;
+ const char **argv;
+
+ count = split_cmdline(line, &argv);
+ if (get_remote_info(opt, count, argv))
+ goto cleanup;
+ opt->use_remote_info = 1;
+ data->skip_object_info = 1;
+ data->mark_query = 0;
+ for (i = 0; i < object_info_oids.nr; i++) {
+ if (remote_object_info[i].sizep)
+ data->size = *remote_object_info[i].sizep;
+ if (remote_object_info[i].typep)
+ data->type = *remote_object_info[i].typep;
+
+ data->oid = object_info_oids.oid[i];
+ if (p_cmd)
+ p_cmd->fn(opt, argv[i+1], output, data);
+ else
+ q_cmd->fn(opt, argv[i+1], output, data);
+ }
+ opt->use_remote_info = 0;
+ data->skip_object_info = 0;
+ data->mark_query = 1;
+
+cleanup:
+ for (i = 0; i < object_info_oids.nr; i++)
+ free_object_info_contents(&remote_object_info[i]);
+ free(remote_object_info);
+}
+
static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
@@ -579,8 +693,12 @@ static void dispatch_calls(struct batch_options *opt,
if (!opt->buffer_output)
die(_("flush is only for --buffer mode"));
- for (i = 0; i < nr; i++)
- cmd[i].fn(opt, cmd[i].line, output, data);
+ for (i = 0; i < nr; i++) {
+ if (!strcmp(cmd[i].name, "remote-object-info"))
+ parse_remote_info(opt, cmd[i].line, output, data, NULL, &cmd[i]);
+ else
+ cmd[i].fn(opt, cmd[i].line, output, data);
+ }
fflush(stdout);
}
@@ -640,11 +758,17 @@ static void batch_objects_command(struct batch_options *opt,
dispatch_calls(opt, output, data, queued_cmd, nr);
free_cmds(queued_cmd, &nr);
} else if (!opt->buffer_output) {
- cmd->fn(opt, p, output, data);
+ if (!strcmp(cmd->name, "remote-object-info")) {
+ char *line = xstrdup_or_null(p);
+ parse_remote_info(opt, line, output, data, cmd, NULL);
+ } else {
+ cmd->fn(opt, p, output, data);
+ }
} else {
ALLOC_GROW(queued_cmd, nr + 1, alloc);
call.fn = cmd->fn;
call.line = xstrdup_or_null(p);
+ call.name = cmd->name;
queued_cmd[nr++] = call;
}
}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 1b85207694..69ee3f8330 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -2,6 +2,9 @@
test_description='git cat-file'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
. ./test-lib.sh
test_cmdmode_usage () {
@@ -1087,4 +1090,348 @@ test_expect_success 'batch-command flush without --buffer' '
grep "^fatal:.*flush is only for --buffer mode.*" err
'
+# This section tests --batch-command with remote-object-info command
+# If a filter is not set, the filter defaults to "%(objectname) %(objectsize) %(objecttype)"
+# Since "%(objecttype)" is currently not supported by the command request, object-info,
+# the filters are set to "%(objectname) %(objectsize)".
+
+set_transport_variables () {
+ tree_sha1=$(git -C "$1" write-tree)
+ commit_sha1=$(echo_without_newline "$commit_message" | git -C "$1" commit-tree $tree_sha1)
+ tag_sha1=$(echo_without_newline "$tag_content" | git -C "$1" hash-object -t tag --stdin -w)
+ tag_size=$(strlen "$tag_content")
+}
+
+# Test --batch-command remote-object-info with 'git://' transport
+
+. "$TEST_DIRECTORY"/lib-git-daemon.sh
+start_git_daemon --export-all --enable=receive-pack
+daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
+
+test_expect_success 'create repo to be served by git-daemon' '
+ git init "$daemon_parent" &&
+ echo_without_newline "$hello_content" > $daemon_parent/hello &&
+ git -C "$daemon_parent" update-index --add hello
+'
+
+set_transport_variables "$daemon_parent"
+
+test_expect_success 'batch-command remote-object-info git://' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $tree_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info git:// multiple sha1 per line' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1 $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// default filter' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 blob $hello_size" >expect &&
+ echo "$tree_sha1 tree $tree_size" >>expect &&
+ echo "$commit_sha1 commit $commit_size" >>expect &&
+ echo "$tag_sha1 tag $tag_size" >>expect &&
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info git://' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1 $tag_sha1
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+stop_git_daemon
+
+# Test --batch-command remote-object-info with 'http://' transport
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'create repo to be served by http:// transport' '
+ git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true &&
+ echo_without_newline "$hello_content" > $HTTPD_DOCUMENT_ROOT_PATH/http_parent/hello &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" update-index --add hello
+'
+set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent"
+
+test_expect_success 'batch-command remote-object-info http://' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $tree_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// one line' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1 $tree_sha1 $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info http://' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1 $tree_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_sha1 $tag_sha1
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// default filter' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 blob $hello_size" >expect &&
+ echo "$tree_sha1 tree $tree_size" >>expect &&
+ echo "$commit_sha1 commit $commit_size" >>expect &&
+ echo "$tag_sha1 tag $tag_size" >>expect &&
+
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1 $tree_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'remote-object-info fails on unspported filter option (objectsize:disk)' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(objectsize:disk)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "objectsize:disk is currently not supported with remote-object-info" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on unspported filter option (deltabase)' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "deltabase is currently not supported with remote-object-info" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on server with legacy protocol' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git -c protocol.version=0 cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "object-info requires protocol v2" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on server with legacy protocol fallback' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git -c protocol.version=0 cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "object-info requires protocol v2" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on malformed OID' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ malformed_object_id="this_id_is_not_valid" &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $malformed_object_id
+ EOF
+ test_i18ngrep "malformed object id '$malformed_object_id'" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on malformed OID fallback' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ malformed_object_id="this_id_is_not_valid" &&
+
+ test_must_fail git cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $malformed_object_id
+ EOF
+ test_i18ngrep "malformed object id '$malformed_object_id'" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on missing OID' '
+ git clone "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" missing_oid_repo &&
+ test_commit -C missing_oid_repo message1 c.txt &&
+ (
+ cd missing_oid_repo &&
+ object_id=$(git rev-parse message1:c.txt) &&
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $object_id
+ EOF
+ test_i18ngrep "object-info: not our ref $object_id" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on missing OID fallback' '
+ (
+ cd missing_oid_repo &&
+ object_id=$(git rev-parse message1:c.txt) &&
+ test_must_fail git cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $object_id
+ EOF
+ test_i18ngrep "fatal: remote error: upload-pack: not our ref $object_id" err
+ )
+'
+
+# Test --batch-command remote-object-info with 'file://' transport
+
+test_expect_success 'create repo to be served by file:// transport' '
+ git init server &&
+ git -C server config protocol.version 2 &&
+ echo_without_newline "$hello_content" > server/hello &&
+ git -C server update-index --add hello
+'
+
+set_transport_variables "server"
+
+test_expect_success 'batch-command remote-object-info file://' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1
+ remote-object-info "file://$(pwd)" $tree_sha1
+ remote-object-info "file://$(pwd)" $commit_sha1
+ remote-object-info "file://$(pwd)" $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info file:// multiple sha1 per line' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1 $tree_sha1 $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info file://' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1 $tree_sha1
+ remote-object-info "file://$(pwd)" $commit_sha1 $tag_sha1
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info file:// default filter' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 blob $hello_size" >expect &&
+ echo "$tree_sha1 tree $tree_size" >>expect &&
+ echo "$commit_sha1 commit $commit_size" >>expect &&
+ echo "$tag_sha1 tag $tag_size" >>expect &&
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1 $tree_sha1
+ remote-object-info "file://$(pwd)" $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
test_done
--
2.36.0.rc2.10170.gb555eefa6f
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-02 17:09 ` [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command Calvin Wan
@ 2022-05-04 21:27 ` Jonathan Tan
2022-05-05 18:13 ` Calvin Wan
2022-07-31 15:24 ` ZheNing Hu
1 sibling, 1 reply; 103+ messages in thread
From: Jonathan Tan @ 2022-05-04 21:27 UTC (permalink / raw)
To: Calvin Wan; +Cc: Jonathan Tan, git, gitster, philipoakley, johncai86, me
Thanks for the patch. I'll limit myself to high-level comments for now,
since the design may need to change if we want to support mixing
remote-object-info with other commands in a --batch-command --buffer
invocation.
Calvin Wan <calvinwan@google.com> writes:
> Since the `info` command in cat-file --batch-command prints object info
> for a given object, it is natural to add another command in cat-file
> --batch-command to print object info for a given object from a remote.
> Add `remote-object-info` to cat-file --batch-command.
>
> While `info` takes object ids one at a time, this creates overhead when
> making requests to a server so `remote-object-info` instead can take
> multiple object ids at once.
>
> cat-file --batch-command is generally implemented in the following
> manner:
>
> - Receive and parse input from user
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
You mention below that this only works in --buffer mode, so I don't
think that this is relevant.
> In --buffer mode, this changes to:
>
> - Receive and parse input from user
> - Store respective function attached to command in a queue
> - After flush, loop through commands in queue
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
>
> Notice how the getting and printing of object info is accomplished one
> at a time. As described above, this creates a problem for making
> requests to a server. Therefore, `remote-object-info` is implemented in
> the following manner:
If you do want this command to work with --buffer and without, probably
clearer to replace "Notice" with "Notice that when cat-file is invoked
both without --buffer and with it". (Having said that, it makes sense to
me to only make it work with --buffer - making one request per OID would
probably not be something that the user would want.)
> - Receive and parse input from user
> If command is `remote-object-info`:
> - Get object info from remote
> - Loop through object info
> - Call respective function attached to `info`
> - Set batch mode state, use passed in object info, print object
> info
> Else:
> - Call respective function attached to command
> - Parse input, get object info, print object info
>
> And finally for --buffer mode `remote-object-info`:
> - Receive and parse input from user
> - Store respective function attached to command in a queue
> - After flush, loop through commands in queue:
> If command is `remote-object-info`:
> - Get object info from remote
> - Loop through object info
> - Call respective function attached to `info`
> - Set batch mode state, use passed in object info, print
> object info
> Else:
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
>
> To summarize, `remote-object-info` gets object info from the remote and
> then generates multiples `info` commands with the object info passed in.
> It cannot be implemented similarly to `info` and `content` because of
> the overhead with remote commenication.
What happens if remote-object-info is mixed with other commands?
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 24a811f0ef..0d9e8e6214 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -115,6 +115,10 @@ info <object>::
> Print object info for object reference `<object>`. This corresponds to the
> output of `--batch-check`.
>
> +remote-object-info <remote> [<object>]::
> + Print object info for object references `<object>` at specified <remote>.
> + This command may only be combined with `--buffer`.
Here you mention that we need --buffer.
> @@ -253,13 +258,14 @@ newline. The available atoms are:
>
> `objectsize:disk`::
> The size, in bytes, that the object takes up on disk. See the
> - note about on-disk sizes in the `CAVEATS` section below.
> + note about on-disk sizes in the `CAVEATS` section below. Not
> + supported by `remote-object-info`.
>
> `deltabase`::
> If the object is stored as a delta on-disk, this expands to the
> full hex representation of the delta base object name.
> Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
> - below.
> + below. Not supported by `remote-object-info`.
OK - makes sense that these are not supported by remote-object-info,
because the remote may not even store objects in delta format.
> @@ -32,11 +35,14 @@ struct batch_options {
> int unordered;
> int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
> const char *format;
> + int use_remote_info;
> };
>
> #define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
>
> static const char *force_path;
> +static struct object_info *remote_object_info;
> +static struct oid_array object_info_oids = OID_ARRAY_INIT;
Hmm...I would have expected the information to appear in the same data
structure used to store other command invocations like "contents",
"info", and "flush", not separately like this. This design doesn't seem
to support mixing commands.
> @@ -538,6 +611,7 @@ typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
> struct queued_cmd {
> parse_cmd_fn_t fn;
> char *line;
> + const char *name;
> };
Rather than store a pointer like this (and thus needing to keep track of
the lifetime of the memory pointed to by this pointer), if we only need
to know whether this queued_cmd is a remote-object-info call, we can
just use "unsigned is_remote_object_info : 1".
> @@ -565,9 +639,49 @@ static const struct parse_cmd {
> } commands[] = {
> { "contents", parse_cmd_contents, 1},
> { "info", parse_cmd_info, 1},
> + { "remote-object-info", parse_cmd_info, 1},
> { "flush", NULL, 0},
> };
Double parse_cmd_info?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-04 21:27 ` Jonathan Tan
@ 2022-05-05 18:13 ` Calvin Wan
2022-05-05 18:44 ` Junio C Hamano
0 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-05-05 18:13 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster, philipoakley, johncai86, me
> You mention below that this only works in --buffer mode, so I don't
> think that this is relevant.
It works in non-buffer mode as well. I meant that if you wanted to use
`remote-object-info` with another option, then --filter is the only
one that it works with. I tried to match the documentation for
--batch-command that says something similar:
--batch-command=<format>
Enter a command mode that reads commands and arguments from stdin. May
only be combined with --buffer, --textconv or --filters
> What happens if remote-object-info is mixed with other commands?
It works with other commands. The idea is that you would enter a line like:
remote-object-info <remote> <oid> <oid> ... <oid>
rather than
remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
since each invocation of remote-object-info would cause another round
trip to the server.
> Hmm...I would have expected the information to appear in the same data
> structure used to store other command invocations like "contents",
> "info", and "flush", not separately like this. This design doesn't seem
> to support mixing commands.
Here's an example of how it works with other commands. Let's say I
call git cat-file --batch-command --buffer and input the following:
remote-object-info remote1 oid1 oid2 oid3
contents oid4
info oid5
remote-object-info remote2 oid6 oid7 oid8
flush
This would result in the following:
(call transport and store info in separate structure)
info oid1 (using passed in object-info)
info oid2
info oid3
contents oid4
info oid5
(call transport and store info in separate structure)
info oid6
info oid7
info oid8
> Rather than store a pointer like this (and thus needing to keep track of
> the lifetime of the memory pointed to by this pointer), if we only need
> to know whether this queued_cmd is a remote-object-info call, we can
> just use "unsigned is_remote_object_info : 1".
OK
>> @@ -565,9 +639,49 @@ static const struct parse_cmd {
>> } commands[] = {
>> { "contents", parse_cmd_contents, 1},
>> { "info", parse_cmd_info, 1},
>> + { "remote-object-info", parse_cmd_info, 1},
>> { "flush", NULL, 0},
>> };
> Double parse_cmd_info?
See above for why remote-object-info becomes info
On Wed, May 4, 2022 at 2:27 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks for the patch. I'll limit myself to high-level comments for now,
> since the design may need to change if we want to support mixing
> remote-object-info with other commands in a --batch-command --buffer
> invocation.
>
> Calvin Wan <calvinwan@google.com> writes:
> > Since the `info` command in cat-file --batch-command prints object info
> > for a given object, it is natural to add another command in cat-file
> > --batch-command to print object info for a given object from a remote.
> > Add `remote-object-info` to cat-file --batch-command.
> >
> > While `info` takes object ids one at a time, this creates overhead when
> > making requests to a server so `remote-object-info` instead can take
> > multiple object ids at once.
> >
> > cat-file --batch-command is generally implemented in the following
> > manner:
> >
> > - Receive and parse input from user
> > - Call respective function attached to command
> > - Set batch mode state, get object info, print object info
>
> You mention below that this only works in --buffer mode, so I don't
> think that this is relevant.
>
> > In --buffer mode, this changes to:
> >
> > - Receive and parse input from user
> > - Store respective function attached to command in a queue
> > - After flush, loop through commands in queue
> > - Call respective function attached to command
> > - Set batch mode state, get object info, print object info
> >
> > Notice how the getting and printing of object info is accomplished one
> > at a time. As described above, this creates a problem for making
> > requests to a server. Therefore, `remote-object-info` is implemented in
> > the following manner:
>
> If you do want this command to work with --buffer and without, probably
> clearer to replace "Notice" with "Notice that when cat-file is invoked
> both without --buffer and with it". (Having said that, it makes sense to
> me to only make it work with --buffer - making one request per OID would
> probably not be something that the user would want.)
>
> > - Receive and parse input from user
> > If command is `remote-object-info`:
> > - Get object info from remote
> > - Loop through object info
> > - Call respective function attached to `info`
> > - Set batch mode state, use passed in object info, print object
> > info
> > Else:
> > - Call respective function attached to command
> > - Parse input, get object info, print object info
> >
> > And finally for --buffer mode `remote-object-info`:
> > - Receive and parse input from user
> > - Store respective function attached to command in a queue
> > - After flush, loop through commands in queue:
> > If command is `remote-object-info`:
> > - Get object info from remote
> > - Loop through object info
> > - Call respective function attached to `info`
> > - Set batch mode state, use passed in object info, print
> > object info
> > Else:
> > - Call respective function attached to command
> > - Set batch mode state, get object info, print object info
> >
> > To summarize, `remote-object-info` gets object info from the remote and
> > then generates multiples `info` commands with the object info passed in.
> > It cannot be implemented similarly to `info` and `content` because of
> > the overhead with remote commenication.
>
> What happens if remote-object-info is mixed with other commands?
>
> > diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> > index 24a811f0ef..0d9e8e6214 100644
> > --- a/Documentation/git-cat-file.txt
> > +++ b/Documentation/git-cat-file.txt
> > @@ -115,6 +115,10 @@ info <object>::
> > Print object info for object reference `<object>`. This corresponds to the
> > output of `--batch-check`.
> >
> > +remote-object-info <remote> [<object>]::
> > + Print object info for object references `<object>` at specified <remote>.
> > + This command may only be combined with `--buffer`.
>
> Here you mention that we need --buffer.
>
> > @@ -253,13 +258,14 @@ newline. The available atoms are:
> >
> > `objectsize:disk`::
> > The size, in bytes, that the object takes up on disk. See the
> > - note about on-disk sizes in the `CAVEATS` section below.
> > + note about on-disk sizes in the `CAVEATS` section below. Not
> > + supported by `remote-object-info`.
> >
> > `deltabase`::
> > If the object is stored as a delta on-disk, this expands to the
> > full hex representation of the delta base object name.
> > Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
> > - below.
> > + below. Not supported by `remote-object-info`.
>
> OK - makes sense that these are not supported by remote-object-info,
> because the remote may not even store objects in delta format.
>
> > @@ -32,11 +35,14 @@ struct batch_options {
> > int unordered;
> > int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
> > const char *format;
> > + int use_remote_info;
> > };
> >
> > #define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
> >
> > static const char *force_path;
> > +static struct object_info *remote_object_info;
> > +static struct oid_array object_info_oids = OID_ARRAY_INIT;
>
> Hmm...I would have expected the information to appear in the same data
> structure used to store other command invocations like "contents",
> "info", and "flush", not separately like this. This design doesn't seem
> to support mixing commands.
>
> > @@ -538,6 +611,7 @@ typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
> > struct queued_cmd {
> > parse_cmd_fn_t fn;
> > char *line;
> > + const char *name;
> > };
>
> Rather than store a pointer like this (and thus needing to keep track of
> the lifetime of the memory pointed to by this pointer), if we only need
> to know whether this queued_cmd is a remote-object-info call, we can
> just use "unsigned is_remote_object_info : 1".
>
> > @@ -565,9 +639,49 @@ static const struct parse_cmd {
> > } commands[] = {
> > { "contents", parse_cmd_contents, 1},
> > { "info", parse_cmd_info, 1},
> > + { "remote-object-info", parse_cmd_info, 1},
> > { "flush", NULL, 0},
> > };
>
> Double parse_cmd_info?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-05 18:13 ` Calvin Wan
@ 2022-05-05 18:44 ` Junio C Hamano
2022-05-05 19:09 ` Junio C Hamano
0 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-05-05 18:44 UTC (permalink / raw)
To: Calvin Wan; +Cc: Jonathan Tan, git, philipoakley, johncai86, me
Calvin Wan <calvinwan@google.com> writes:
> It works with other commands. The idea is that you would enter a line like:
>
> remote-object-info <remote> <oid> <oid> ... <oid>
>
> rather than
>
> remote-object-info <remote> <oid>
> remote-object-info <remote> <oid>
> remote-object-info <remote> <oid>
>
> since each invocation of remote-object-info would cause another round
> trip to the server.
Hmm. It looks unfortunate, but if we do not care about hitting the
pkt-line length limit, that might be OK. I dunno.
I would have expected to see something like
start batch
request 1
request 2
...
request 2000
flush batch
during which the other side patiently listens to our requests. They
(meaning the local process that reads the above and talks to a
remote as needed) can coalesce the requests of the same kind
(e.g. going to the same remote) while buffering and optimize their
operation without having the caller of them to worry about it that
way, no?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-05 18:44 ` Junio C Hamano
@ 2022-05-05 19:09 ` Junio C Hamano
2022-05-05 19:19 ` Calvin Wan
0 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-05-05 19:09 UTC (permalink / raw)
To: Calvin Wan; +Cc: Jonathan Tan, git, philipoakley, johncai86, me
Junio C Hamano <gitster@pobox.com> writes:
> I would have expected to see something like
>
> start batch
> request 1
> request 2
> ...
> request 2000
> flush batch
>
> during which the other side patiently listens to our requests. They
> (meaning the local process that reads the above and talks to a
> remote as needed) can coalesce the requests of the same kind
> (e.g. going to the same remote) while buffering and optimize their
> operation without having the caller of them to worry about it that
> way, no?
Ah, that is in the batch mode, and you were syaing that "one request
with multiple objects" would allow multiple object-info requests to
be hanled efficiently even in non-batch mode. If that was what you
were talking about, I think that does make sense. Thanks.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-05 19:09 ` Junio C Hamano
@ 2022-05-05 19:19 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-05-05 19:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Tan, git, philipoakley, johncai86, me
> Ah, that is in the batch mode, and you were syaing that "one request
> with multiple objects" would allow multiple object-info requests to
> be hanled efficiently even in non-batch mode. If that was what you
> were talking about, I think that does make sense. Thanks.
Correct, also I think you mean "buffer" mode rather than "batch" mode.
On Thu, May 5, 2022 at 12:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > I would have expected to see something like
> >
> > start batch
> > request 1
> > request 2
> > ...
> > request 2000
> > flush batch
> >
> > during which the other side patiently listens to our requests. They
> > (meaning the local process that reads the above and talks to a
> > remote as needed) can coalesce the requests of the same kind
> > (e.g. going to the same remote) while buffering and optimize their
> > operation without having the caller of them to worry about it that
> > way, no?
>
> Ah, that is in the batch mode, and you were syaing that "one request
> with multiple objects" would allow multiple object-info requests to
> be hanled efficiently even in non-batch mode. If that was what you
> were talking about, I think that does make sense. Thanks.
>
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-05-02 17:09 ` [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-04 21:27 ` Jonathan Tan
@ 2022-07-31 15:24 ` ZheNing Hu
2022-08-08 17:43 ` Calvin Wan
1 sibling, 1 reply; 103+ messages in thread
From: ZheNing Hu @ 2022-07-31 15:24 UTC (permalink / raw)
To: Calvin Wan
Cc: Git List, Junio C Hamano, Jonathan Tan, Philip Oakley, johncai86,
Taylor Blau
Calvin Wan <calvinwan@google.com> 于2022年5月3日周二 06:33写道:
>
> Since the `info` command in cat-file --batch-command prints object info
> for a given object, it is natural to add another command in cat-file
> --batch-command to print object info for a given object from a remote.
> Add `remote-object-info` to cat-file --batch-command.
>
> While `info` takes object ids one at a time, this creates overhead when
> making requests to a server so `remote-object-info` instead can take
> multiple object ids at once.
>
> cat-file --batch-command is generally implemented in the following
> manner:
>
> - Receive and parse input from user
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
>
> In --buffer mode, this changes to:
>
> - Receive and parse input from user
> - Store respective function attached to command in a queue
> - After flush, loop through commands in queue
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
>
> Notice how the getting and printing of object info is accomplished one
> at a time. As described above, this creates a problem for making
> requests to a server. Therefore, `remote-object-info` is implemented in
> the following manner:
>
> - Receive and parse input from user
> If command is `remote-object-info`:
> - Get object info from remote
> - Loop through object info
> - Call respective function attached to `info`
> - Set batch mode state, use passed in object info, print object
> info
> Else:
> - Call respective function attached to command
> - Parse input, get object info, print object info
>
> And finally for --buffer mode `remote-object-info`:
> - Receive and parse input from user
> - Store respective function attached to command in a queue
> - After flush, loop through commands in queue:
> If command is `remote-object-info`:
> - Get object info from remote
> - Loop through object info
> - Call respective function attached to `info`
> - Set batch mode state, use passed in object info, print
> object info
> Else:
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
>
> To summarize, `remote-object-info` gets object info from the remote and
> then generates multiples `info` commands with the object info passed in.
> It cannot be implemented similarly to `info` and `content` because of
> the overhead with remote commenication.
>
> ---
> Documentation/git-cat-file.txt | 16 +-
> builtin/cat-file.c | 200 +++++++++++++++----
> t/t1006-cat-file.sh | 347 +++++++++++++++++++++++++++++++++
> 3 files changed, 522 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 24a811f0ef..0d9e8e6214 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -115,6 +115,10 @@ info <object>::
> Print object info for object reference `<object>`. This corresponds to the
> output of `--batch-check`.
>
> +remote-object-info <remote> [<object>]::
> + Print object info for object references `<object>` at specified <remote>.
> + This command may only be combined with `--buffer`.
> +
> flush::
> Used with `--buffer` to execute all preceding commands that were issued
> since the beginning or since the last flush was issued. When `--buffer`
> @@ -245,7 +249,8 @@ newline. The available atoms are:
> The full hex representation of the object name.
>
> `objecttype`::
> - The type of the object (the same as `cat-file -t` reports).
> + The type of the object (the same as `cat-file -t` reports). See
> + `CAVEATS` below.
>
> `objectsize`::
> The size, in bytes, of the object (the same as `cat-file -s`
> @@ -253,13 +258,14 @@ newline. The available atoms are:
>
> `objectsize:disk`::
> The size, in bytes, that the object takes up on disk. See the
> - note about on-disk sizes in the `CAVEATS` section below.
> + note about on-disk sizes in the `CAVEATS` section below. Not
> + supported by `remote-object-info`.
>
> `deltabase`::
> If the object is stored as a delta on-disk, this expands to the
> full hex representation of the delta base object name.
> Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
> - below.
> + below. Not supported by `remote-object-info`.
>
> `rest`::
> If this atom is used in the output string, input lines are split
Why not forbidden %(rest) here too?
> +static int get_remote_info(struct batch_options *opt, int argc, const char **argv)
> +{
> + int retval = 0;
> + size_t i;
> + struct remote *remote = NULL;
> + struct object_id oid;
> + struct string_list object_info_options = STRING_LIST_INIT_NODUP;
> + static struct transport *gtransport;
> + char *format = DEFAULT_FORMAT;
> +
> + if (opt->format)
> + format = xstrdup(opt->format);
> +
> + remote = remote_get(argv[0]);
> + if (!remote)
> + die(_("must supply valid remote when using --object-info"));
> + oid_array_clear(&object_info_oids);
> + for (i = 1; i < argc; i++) {
> + if (get_oid(argv[i], &oid))
> + die(_("malformed object id '%s'"), argv[i]);
> + oid_array_append(&object_info_oids, &oid);
> + }
> +
> + gtransport = transport_get(remote, NULL);
> + if (gtransport->smart_options) {
> + size_t j;
> + int include_size = 0, include_type = 0;
> +
> + remote_object_info = xcalloc(object_info_oids.nr, sizeof(struct object_info));
> + gtransport->smart_options->object_info = 1;
> + gtransport->smart_options->object_info_oids = &object_info_oids;
> + /**
> + * 'size' is the only option currently supported.
> + * Other options that are passed in the format will default to a
> + * standard fetch request rather than object-info.
> + */
> + if (strstr(format, "%(objectsize)")) {
> + string_list_append(&object_info_options, "size");
> + include_size = 1;
> + }
> + if (strstr(format, "%(objecttype)")) {
> + string_list_append(&object_info_options, "type");
> + include_type = 1;
> + }
> + if (strstr(format, "%(objectsize:disk)"))
> + die(_("objectsize:disk is currently not supported with remote-object-info"));
> + if (strstr(format, "%(deltabase)"))
> + die(_("deltabase is currently not supported with remote-object-info"));
%(rest) too?
> + if (object_info_options.nr > 0) {
> + gtransport->smart_options->object_info_options = &object_info_options;
> + for (j = 0; j < object_info_oids.nr; j++) {
> + if (include_size)
> + remote_object_info[j].sizep = xcalloc(1, sizeof(long));
> + if (include_type)
> + remote_object_info[j].typep = xcalloc(1, sizeof(enum object_type));
> + }
> + gtransport->smart_options->object_info_data = &remote_object_info;
> + retval = transport_fetch_refs(gtransport, NULL);
> + }
> + } else {
> + retval = -1;
> + }
> + return retval;
> +}
Maybe such code style will be better?
if (!gtransport->smart_options) {
return -1;
}
...
return transport_fetch_refs(gtransport, NULL);
> +
> struct object_cb_data {
> struct batch_options *opt;
> struct expand_data *expand;
> @@ -538,6 +611,7 @@ typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
> struct queued_cmd {
> parse_cmd_fn_t fn;
> char *line;
> + const char *name;
> };
>
> static void parse_cmd_contents(struct batch_options *opt,
> @@ -565,9 +639,49 @@ static const struct parse_cmd {
> } commands[] = {
> { "contents", parse_cmd_contents, 1},
> { "info", parse_cmd_info, 1},
> + { "remote-object-info", parse_cmd_info, 1},
> { "flush", NULL, 0},
> };
>
> +static void parse_remote_info(struct batch_options *opt,
> + char *line,
> + struct strbuf *output,
> + struct expand_data *data,
> + const struct parse_cmd *p_cmd,
> + struct queued_cmd *q_cmd)
> +{
> + int count;
> + size_t i;
> + const char **argv;
> +
> + count = split_cmdline(line, &argv);
> + if (get_remote_info(opt, count, argv))
> + goto cleanup;
> + opt->use_remote_info = 1;
> + data->skip_object_info = 1;
> + data->mark_query = 0;
> + for (i = 0; i < object_info_oids.nr; i++) {
> + if (remote_object_info[i].sizep)
> + data->size = *remote_object_info[i].sizep;
> + if (remote_object_info[i].typep)
> + data->type = *remote_object_info[i].typep;
> +
> + data->oid = object_info_oids.oid[i];
> + if (p_cmd)
> + p_cmd->fn(opt, argv[i+1], output, data);
> + else
> + q_cmd->fn(opt, argv[i+1], output, data);
> + }
> + opt->use_remote_info = 0;
> + data->skip_object_info = 0;
> + data->mark_query = 1;
> +
> +cleanup:
> + for (i = 0; i < object_info_oids.nr; i++)
> + free_object_info_contents(&remote_object_info[i]);
> + free(remote_object_info);
> +}
> +
> static void dispatch_calls(struct batch_options *opt,
> struct strbuf *output,
> struct expand_data *data,
> @@ -579,8 +693,12 @@ static void dispatch_calls(struct batch_options *opt,
> if (!opt->buffer_output)
> die(_("flush is only for --buffer mode"));
>
> - for (i = 0; i < nr; i++)
> - cmd[i].fn(opt, cmd[i].line, output, data);
> + for (i = 0; i < nr; i++) {
> + if (!strcmp(cmd[i].name, "remote-object-info"))
> + parse_remote_info(opt, cmd[i].line, output, data, NULL, &cmd[i]);
> + else
> + cmd[i].fn(opt, cmd[i].line, output, data);
> + }
>
> fflush(stdout);
> }
> @@ -640,11 +758,17 @@ static void batch_objects_command(struct batch_options *opt,
> dispatch_calls(opt, output, data, queued_cmd, nr);
> free_cmds(queued_cmd, &nr);
> } else if (!opt->buffer_output) {
> - cmd->fn(opt, p, output, data);
> + if (!strcmp(cmd->name, "remote-object-info")) {
> + char *line = xstrdup_or_null(p);
> + parse_remote_info(opt, line, output, data, cmd, NULL);
memory leak: "line".
> + } else {
> + cmd->fn(opt, p, output, data);
> + }
> } else {
> ALLOC_GROW(queued_cmd, nr + 1, alloc);
> call.fn = cmd->fn;
> call.line = xstrdup_or_null(p);
> + call.name = cmd->name;
> queued_cmd[nr++] = call;
> }
> }
> +
> +test_expect_success 'remote-object-info fails on unspported filter option (objectsize:disk)' '
> + (
> + cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +
> + test_must_fail git cat-file --batch-command="%(objectsize:disk)" 2>err <<-EOF &&
> + remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
> + EOF
> + test_i18ngrep "objectsize:disk is currently not supported with remote-object-info" err
> + )
> +'
> +
> +test_expect_success 'remote-object-info fails on unspported filter option (deltabase)' '
> + (
> + cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +
> + test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
> + remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
> + EOF
> + test_i18ngrep "deltabase is currently not supported with remote-object-info" err
> + )
> +'
> +
%(rest) too?
> +set_transport_variables "server"
> +
> +test_expect_success 'batch-command remote-object-info file://' '
> + (
> + cd server &&
> +
> + echo "$hello_sha1 $hello_size" >expect &&
> + echo "$tree_sha1 $tree_size" >>expect &&
> + echo "$commit_sha1 $commit_size" >>expect &&
> + echo "$tag_sha1 $tag_size" >>expect &&
> + git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
> + remote-object-info "file://$(pwd)" $hello_sha1
> + remote-object-info "file://$(pwd)" $tree_sha1
> + remote-object-info "file://$(pwd)" $commit_sha1
> + remote-object-info "file://$(pwd)" $tag_sha1
> + EOF
> + test_cmp expect actual
> + )
Can we support <rev> instead of only <oid> here?
$ git cat-file --batch-check
HEAD
28583b8d8ca72730d7c9e0ea50861ad431a6dea4 commit 3038
master
ab336e8f1c8009c8b1aab8deb592148e69217085 commit 281
origin/master
23b219f8e3f2adfb0441e135f0a880e6124f766c commit 282
origin/master:git.c
e5d62fa5a92e95e1ede041ebf913d841744c31f8 blob 28398
So cat-file --batch-check can support it.
$git cat-file --batch-commands
remote-object-info "file://$(pwd)" master:git.c
I guess it cannot work now, right?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
2022-07-31 15:24 ` ZheNing Hu
@ 2022-08-08 17:43 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-08 17:43 UTC (permalink / raw)
To: ZheNing Hu
Cc: Git List, Junio C Hamano, Jonathan Tan, Philip Oakley, johncai86,
Taylor Blau
> > `rest`::
> > If this atom is used in the output string, input lines are split
>
> Why not forbidden %(rest) here too?
Good catch. This should definitely be forbidden. I thought at first
this was inconsequential, but since I also have remote as part of
the input, this would no longer hold true.
> > + if (strstr(format, "%(objectsize:disk)"))
> > + die(_("objectsize:disk is currently not supported with remote-object-info"));
> > + if (strstr(format, "%(deltabase)"))
> > + die(_("deltabase is currently not supported with remote-object-info"));
>
> %(rest) too?
ditto
> Maybe such code style will be better?
>
> if (!gtransport->smart_options) {
> return -1;
> }
> ...
> return transport_fetch_refs(gtransport, NULL);
That does look better!
> > - cmd->fn(opt, p, output, data);
> > + if (!strcmp(cmd->name, "remote-object-info")) {
> > + char *line = xstrdup_or_null(p);
> > + parse_remote_info(opt, line, output, data, cmd, NULL);
>
> memory leak: "line".
ack
> > +test_expect_success 'remote-object-info fails on unspported filter option (deltabase)' '
> > + (
> > + cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> > +
> > + test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
> > + remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
> > + EOF
> > + test_i18ngrep "deltabase is currently not supported with remote-object-info" err
> > + )
> > +'
> > +
>
> %(rest) too?
ditto
> > +test_expect_success 'batch-command remote-object-info file://' '
> > + (
> > + cd server &&
> > +
> > + echo "$hello_sha1 $hello_size" >expect &&
> > + echo "$tree_sha1 $tree_size" >>expect &&
> > + echo "$commit_sha1 $commit_size" >>expect &&
> > + echo "$tag_sha1 $tag_size" >>expect &&
> > + git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
> > + remote-object-info "file://$(pwd)" $hello_sha1
> > + remote-object-info "file://$(pwd)" $tree_sha1
> > + remote-object-info "file://$(pwd)" $commit_sha1
> > + remote-object-info "file://$(pwd)" $tag_sha1
> > + EOF
> > + test_cmp expect actual
> > + )
>
> Can we support <rev> instead of only <oid> here?
Not at the current moment. The server is unable to handle
anything besides oids.
>
> $ git cat-file --batch-check
> HEAD
> 28583b8d8ca72730d7c9e0ea50861ad431a6dea4 commit 3038
> master
> ab336e8f1c8009c8b1aab8deb592148e69217085 commit 281
> origin/master
> 23b219f8e3f2adfb0441e135f0a880e6124f766c commit 282
> origin/master:git.c
> e5d62fa5a92e95e1ede041ebf913d841744c31f8 blob 28398
>
> So cat-file --batch-check can support it.
>
> $git cat-file --batch-commands
> remote-object-info "file://$(pwd)" master:git.c
>
> I guess it cannot work now, right?
Correct.
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (7 preceding siblings ...)
2022-05-02 17:09 ` [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-28 23:56 ` Junio C Hamano
` (2 more replies)
2022-07-28 23:02 ` [PATCH v5 1/6] fetch-pack: refactor packet writing Calvin Wan
` (7 subsequent siblings)
16 siblings, 3 replies; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
Sometimes it is useful to get information about an object without having
to download it completely. The server logic has already been implemented
as “a2ba162cda (object-info: support for retrieving object info,
2021-04-20)”. This patch implements the client option for it.
Add `--object-info` option to `cat-file --batch-command`. This option
allows the client to make an object-info command request to a server
that supports protocol v2. If the server is v2, but does not allow for
the object-info command request, the entire object is fetched and the
relevant object info is returned.
=== Changes since v4 ===
- Instead of making 2 rounds trips to determine whether the server can
handle the client object-info request, the server now advertises the
features it can send back so the client can immediately fallback to
fetch if any feature is not supported
- Fixed nits and clarified some documentation/commit messages
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Calvin Wan (6):
fetch-pack: refactor packet writing
fetch-pack: move fetch initialization
protocol-caps: initialization bug fix
serve: advertise object-info feature
transport: add client support for object-info
cat-file: add remote-object-info to batch-command
Documentation/git-cat-file.txt | 16 +-
builtin/cat-file.c | 225 ++++++++++++++++-----
fetch-pack.c | 52 +++--
fetch-pack.h | 10 +
object-file.c | 11 ++
object-store.h | 3 +
protocol-caps.c | 2 +-
serve.c | 10 +-
t/t1006-cat-file.sh | 348 +++++++++++++++++++++++++++++++++
t/t5555-http-smart-common.sh | 2 +-
t/t5701-git-serve.sh | 2 +-
transport-helper.c | 7 +-
transport.c | 110 ++++++++++-
transport.h | 11 ++
14 files changed, 734 insertions(+), 75 deletions(-)
base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command
2022-07-28 23:02 ` [PATCH v5 0/6] " Calvin Wan
@ 2022-07-28 23:56 ` Junio C Hamano
2022-07-29 0:02 ` Junio C Hamano
2022-07-31 8:41 ` Phillip Wood
2022-09-30 23:23 ` Junio C Hamano
2 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-07-28 23:56 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> Sometimes it is useful to get information about an object without having
> to download it completely. The server logic has already been implemented
> as “a2ba162cda (object-info: support for retrieving object info,
> 2021-04-20)”. This patch implements the client option for it.
>
> Add `--object-info` option to `cat-file --batch-command`. This option
> allows the client to make an object-info command request to a server
> that supports protocol v2. If the server is v2, but does not allow for
> the object-info command request, the entire object is fetched and the
> relevant object info is returned.
Thanks. Since this has changed significantly enough to render
range-diff less useful, I think a fresh round of full review may be
needed. I'll use a more recent tip of 'master' as the base when I
queue these patches.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command
2022-07-28 23:56 ` Junio C Hamano
@ 2022-07-29 0:02 ` Junio C Hamano
0 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-07-29 0:02 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Junio C Hamano <gitster@pobox.com> writes:
> Thanks. Since this has changed significantly enough to render
> range-diff less useful, I think a fresh round of full review may be
> needed. I'll use a more recent tip of 'master' as the base when I
> queue these patches.
Having said that, here is a diff between
a/ is previous round queued with fixups merged to 'master'
b/ is the latest round queued on top of 'master'.
which may help people to see what got changed since the previous
round.
I notice that use of CALLOC_ARRAY() suggested during the review of
the last round was dropped (it should be easy to resurrect).
Documentation/git-cat-file.txt | 2 +-
builtin/cat-file.c | 4 +--
fetch-pack.c | 17 +++--------
fetch-pack.h | 4 +--
object-file.c | 1 -
protocol-caps.c | 12 ++++----
serve.c | 10 +++++-
t/t1006-cat-file.sh | 3 +-
t/t5555-http-smart-common.sh | 2 +-
t/t5701-git-serve.sh | 2 +-
transport.c | 69 +++++++++++++++++++++++++-----------------
11 files changed, 70 insertions(+), 56 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 0d9e8e6214..c073d5e50d 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -117,7 +117,7 @@ info <object>::
remote-object-info <remote> [<object>]::
Print object info for object references `<object>` at specified <remote>.
- This command may only be combined with `--buffer`.
+ This command may be combined with `--buffer`.
flush::
Used with `--buffer` to execute all preceding commands that were issued
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 48bf0ee02f..57c090f249 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -515,12 +515,12 @@ static int get_remote_info(struct batch_options *opt, int argc, const char **arg
size_t j;
int include_size = 0, include_type = 0;
- CALLOC_ARRAY(remote_object_info, object_info_oids.nr);
+ remote_object_info = xcalloc(object_info_oids.nr, sizeof(struct object_info));
gtransport->smart_options->object_info = 1;
gtransport->smart_options->object_info_oids = &object_info_oids;
/**
* 'size' is the only option currently supported.
- * Other options that are passed in the format will default to a
+ * Other options that are passed in the format will fallback to a
* standard fetch request rather than object-info.
*/
if (strstr(format, "%(objectsize)")) {
diff --git a/fetch-pack.c b/fetch-pack.c
index ca06a7130f..d373aed775 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1296,14 +1296,16 @@ static void write_command_and_capabilities(struct strbuf *req_buf,
void send_object_info_request(int fd_out, struct object_info_args *args)
{
struct strbuf req_buf = STRBUF_INIT;
- struct string_list unsorted_object_info_options = *args->object_info_options;
size_t i;
write_command_and_capabilities(&req_buf, args->server_options, "object-info");
- if (unsorted_string_list_has_string(&unsorted_object_info_options, "size"))
+ if (unsorted_string_list_has_string(args->object_info_options, "size"))
packet_buf_write(&req_buf, "size");
+ if (unsorted_string_list_has_string(args->object_info_options, "type"))
+ packet_buf_write(&req_buf, "type");
+
if (args->oids) {
for (i = 0; i < args->oids->nr; i++)
packet_buf_write(&req_buf, "oid %s", oid_to_hex(&args->oids->oid[i]));
@@ -1669,7 +1671,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
- if (!args->refetch && !args->object_info && everything_local(args, &ref))
+ if (!args->refetch && everything_local(args, &ref))
state = FETCH_DONE;
else
state = FETCH_SEND_REQUEST;
@@ -2056,15 +2058,6 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
args->connectivity_checked = 1;
}
- if (args->object_info) {
- struct ref *ref_cpy_reader = ref_cpy;
- int i = 0;
- while (ref_cpy_reader) {
- oid_object_info_extended(the_repository, &ref_cpy_reader->old_oid, &(*args->object_info_data)[i], OBJECT_INFO_LOOKUP_REPLACE);
- ref_cpy_reader = ref_cpy_reader->next;
- i++;
- }
- }
update_shallow(args, sought, nr_sought, &si);
cleanup:
clear_shallow_info(&si);
diff --git a/fetch-pack.h b/fetch-pack.h
index 552fd7bde0..11c513f748 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -72,9 +72,9 @@ struct fetch_pack_args {
};
struct object_info_args {
- const struct string_list *object_info_options;
+ struct string_list *object_info_options;
const struct string_list *server_options;
- const struct oid_array *oids;
+ struct oid_array *oids;
};
/*
diff --git a/object-file.c b/object-file.c
index 333fed9f68..417c76def3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2844,7 +2844,6 @@ void free_object_info_contents(struct object_info *object_info)
{
if (!object_info)
return;
-
free(object_info->typep);
free(object_info->sizep);
free(object_info->disk_sizep);
diff --git a/protocol-caps.c b/protocol-caps.c
index bc7def0727..2ad9f45c59 100644
--- a/protocol-caps.c
+++ b/protocol-caps.c
@@ -11,7 +11,6 @@
struct requested_info {
unsigned size : 1;
- unsigned unknown : 1;
};
/*
@@ -41,12 +40,12 @@ static void send_info(struct repository *r, struct packet_writer *writer,
struct string_list_item *item;
struct strbuf send_buffer = STRBUF_INIT;
+ if (!oid_str_list->nr)
+ return;
+
if (info->size)
packet_writer_write(writer, "size");
- if (info->unknown || !oid_str_list->nr)
- goto release;
-
for_each_string_list_item (item, oid_str_list) {
const char *oid_str = item->string;
struct object_id oid;
@@ -73,7 +72,6 @@ static void send_info(struct repository *r, struct packet_writer *writer,
packet_writer_write(writer, "%s", send_buffer.buf);
strbuf_reset(&send_buffer);
}
-release:
strbuf_release(&send_buffer);
}
@@ -94,7 +92,9 @@ int cap_object_info(struct repository *r, struct packet_reader *request)
if (parse_oid(request->line, &oid_str_list))
continue;
- info.unknown = 1;
+ packet_writer_error(&writer,
+ "object-info: unexpected line: '%s'",
+ request->line);
}
if (request->status != PACKET_READ_FLUSH) {
diff --git a/serve.c b/serve.c
index 733347f602..1adf9df4a8 100644
--- a/serve.c
+++ b/serve.c
@@ -56,6 +56,14 @@ static int session_id_advertise(struct repository *r, struct strbuf *value)
return 1;
}
+static int object_info_advertise(struct repository *r,
+ struct strbuf *value)
+{
+ if (value)
+ strbuf_addstr(value, "size");
+ return 1;
+}
+
static void session_id_receive(struct repository *r,
const char *client_sid)
{
@@ -132,7 +140,7 @@ static struct protocol_capability capabilities[] = {
},
{
.name = "object-info",
- .advertise = always_advertise,
+ .advertise = object_info_advertise,
.command = cap_object_info,
},
};
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f0baea07c..55b2a1f06f 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -1100,6 +1100,7 @@ test_expect_success 'batch-command flush without --buffer' '
# If a filter is not set, the filter defaults to "%(objectname) %(objectsize) %(objecttype)"
# Since "%(objecttype)" is currently not supported by the command request, object-info,
# the filters are set to "%(objectname) %(objectsize)".
+# Tests with the default filter are used to test the fallback to 'fetch' command
set_transport_variables () {
tree_sha1=$(git -C "$1" write-tree)
@@ -1163,7 +1164,7 @@ test_expect_success 'batch-command remote-object-info http:// default filter' '
echo "$tree_sha1 tree $tree_size" >>expect &&
echo "$commit_sha1 commit $commit_size" >>expect &&
echo "$tag_sha1 tag $tag_size" >>expect &&
- git cat-file --batch-command >actual <<-EOF &&
+ GIT_TRACE_PACKET=1 git cat-file --batch-command >actual <<-EOF &&
remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1
remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1 $tag_sha1
EOF
diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh
index b1cfe8b7db..5a16d4259a 100755
--- a/t/t5555-http-smart-common.sh
+++ b/t/t5555-http-smart-common.sh
@@ -131,7 +131,7 @@ test_expect_success 'git upload-pack --advertise-refs: v2' '
fetch=shallow wait-for-done
server-option
object-format=$(test_oid algo)
- object-info
+ object-info=size
0000
EOF
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 1896f671cb..ebb32644e3 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -20,7 +20,7 @@ test_expect_success 'test capability advertisement' '
fetch=shallow wait-for-done
server-option
object-format=$(test_oid algo)
- object-info
+ object-info=size
0000
EOF
diff --git a/transport.c b/transport.c
index 1baa1a194a..64bcc311ff 100644
--- a/transport.c
+++ b/transport.c
@@ -360,15 +360,16 @@ static int fetch_object_info(struct transport *transport, struct object_info **o
struct git_transport_data *data = transport->data;
struct object_info_args args;
struct packet_reader reader;
- struct string_list server_attributes = STRING_LIST_INIT_DUP;
memset(&args, 0, sizeof(args));
args.server_options = transport->server_options;
args.object_info_options = transport->smart_options->object_info_options;
+ args.oids = transport->smart_options->object_info_oids;
connect_setup(transport, 0);
packet_reader_init(&reader, data->fd[0], NULL, 0,
PACKET_READ_CHOMP_NEWLINE |
+ PACKET_READ_GENTLE_ON_EOF |
PACKET_READ_DIE_ON_ERR_PACKET);
data->version = discover_version(&reader);
@@ -378,25 +379,14 @@ static int fetch_object_info(struct transport *transport, struct object_info **o
case protocol_v2:
if (!server_supports_v2("object-info", 0))
return -1;
- /**
- * Send a request with only attributes first. If server can return all
- * of the requested attributes, then send request with object ids
- */
- send_object_info_request(data->fd[1], &args);
- if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
- check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+ if (unsorted_string_list_has_string(args.object_info_options, "size")
+ && !server_supports_feature("object-info", "size", 0)) {
return -1;
}
- string_list_split(&server_attributes, reader.line, ' ', -1);
- packet_reader_read(&reader);
- check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
- if (server_attributes.nr != args.object_info_options->nr)
+ if (unsorted_string_list_has_string(args.object_info_options, "type")
+ && !server_supports_feature("object-info", "type", 0)) {
return -1;
- for (i = 0; i < server_attributes.nr; i++) {
- if (!strcmp(server_attributes.items[i].string, "size"))
- size_index = i + 1;
}
- args.oids = transport->smart_options->object_info_oids;
send_object_info_request(data->fd[1], &args);
break;
case protocol_v1:
@@ -406,18 +396,29 @@ static int fetch_object_info(struct transport *transport, struct object_info **o
BUG("unknown protocol version");
}
- if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
- die(_("error reading object info header"));
+ for (i = 0; i < args.object_info_options->nr; i++) {
+ if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
+ check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+ return -1;
+ }
+ if (unsorted_string_list_has_string(args.object_info_options, reader.line)) {
+ if (!strcmp(reader.line, "size"))
+ size_index = i;
+ continue;
+ }
+ return -1;
+ }
+
i = 0;
- while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
+ while (packet_reader_read(&reader) == PACKET_READ_NORMAL && i < args.oids->nr) {
struct string_list object_info_values = STRING_LIST_INIT_DUP;
string_list_split(&object_info_values, reader.line, ' ', -1);
- if (size_index > 0) {
- if (!strcmp(object_info_values.items[size_index].string, ""))
+ if (0 <= size_index) {
+ if (!strcmp(object_info_values.items[1 + size_index].string, ""))
die("object-info: not our ref %s",
object_info_values.items[0].string);
- *(*object_info_data)[i].sizep = strtoul(object_info_values.items[size_index].string, NULL, 10);
+ *(*object_info_data)[i].sizep = strtoul(object_info_values.items[1 + size_index].string, NULL, 10);
}
i++;
}
@@ -441,7 +442,7 @@ static int fetch_refs_via_pack(struct transport *transport,
struct ref *refs = NULL;
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
- struct ref *wanted_refs = xcalloc(1, sizeof (struct ref));
+ struct ref *object_info_refs = xcalloc(1, sizeof (struct ref));
memset(&args, 0, sizeof(args));
args.uploadpack = data->options.uploadpack;
@@ -470,11 +471,13 @@ static int fetch_refs_via_pack(struct transport *transport,
args.object_info = transport->smart_options->object_info;
if (transport->smart_options && transport->smart_options->object_info) {
- struct ref *ref = wanted_refs;
+ struct ref *ref = object_info_refs;
if (!fetch_object_info(transport, data->options.object_info_data))
goto cleanup;
args.object_info_data = data->options.object_info_data;
+ args.quiet = 1;
+ args.no_progress = 1;
for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
temp_ref->old_oid = *(transport->smart_options->object_info_oids->oid + i);
@@ -482,7 +485,7 @@ static int fetch_refs_via_pack(struct transport *transport,
ref->next = temp_ref;
ref = ref->next;
}
- transport->remote_refs = wanted_refs->next;
+ transport->remote_refs = object_info_refs->next;
} else if (!data->got_remote_heads) {
int i;
int must_list_refs = 0;
@@ -500,7 +503,7 @@ static int fetch_refs_via_pack(struct transport *transport,
else if (data->version <= protocol_v1)
die_if_server_options(transport);
- if (data->options.acked_commits && !transport->smart_options->object_info) {
+ if (data->options.acked_commits) {
if (data->version < protocol_v2) {
warning(_("--negotiate-only requires protocol v2"));
ret = -1;
@@ -523,12 +526,22 @@ static int fetch_refs_via_pack(struct transport *transport,
to_fetch, nr_heads, &data->shallow,
&transport->pack_lockfiles, data->version);
+ if (args.object_info) {
+ struct ref *ref_cpy_reader = object_info_refs->next;
+ int i = 0;
+ while (ref_cpy_reader) {
+ oid_object_info_extended(the_repository, &ref_cpy_reader->old_oid, &(*args.object_info_data)[i], OBJECT_INFO_LOOKUP_REPLACE);
+ ref_cpy_reader = ref_cpy_reader->next;
+ i++;
+ }
+ }
+
data->got_remote_heads = 0;
data->options.self_contained_and_connected =
args.self_contained_and_connected;
data->options.connectivity_checked = args.connectivity_checked;
- if (!refs)
+ if (refs == NULL && !args.object_info)
ret = -1;
if (report_unmatched_refs(to_fetch, nr_heads))
ret = -1;
@@ -543,7 +556,7 @@ static int fetch_refs_via_pack(struct transport *transport,
free_refs(refs_tmp);
free_refs(refs);
- free_refs(wanted_refs);
+ free_refs(object_info_refs);
return ret;
}
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command
2022-07-28 23:02 ` [PATCH v5 0/6] " Calvin Wan
2022-07-28 23:56 ` Junio C Hamano
@ 2022-07-31 8:41 ` Phillip Wood
2022-08-04 22:57 ` Calvin Wan
2022-09-30 23:23 ` Junio C Hamano
2 siblings, 1 reply; 103+ messages in thread
From: Phillip Wood @ 2022-07-31 8:41 UTC (permalink / raw)
To: Calvin Wan, git; +Cc: gitster, jonathantanmy, philipoakley, johncai86
Hi Calvin
On 29/07/2022 00:02, Calvin Wan wrote:
> Sometimes it is useful to get information about an object without having
> to download it completely. The server logic has already been implemented
> as “a2ba162cda (object-info: support for retrieving object info,
> 2021-04-20)”. This patch implements the client option for it.
>
> Add `--object-info` option to `cat-file --batch-command`.
This part of the cover letter seems to have been left over from a
previous iteration. I noticed that in patch 6 there is a reference to
'--object-info' in an error message as well.
This might be a daft question but could we teach cat-file to check if an
object is available locally and if it is not retrieve just the metadata
from the remote rather than adding a new command to --batch-command?
That way it would be transparent to the user and they would not have to
worry about whether they might be querying a remote object or not.
Best Wishes
Phillip
> This option
> allows the client to make an object-info command request to a server
> that supports protocol v2. If the server is v2, but does not allow for
> the object-info command request, the entire object is fetched and the
> relevant object info is returned.
>
> === Changes since v4 ===
> - Instead of making 2 rounds trips to determine whether the server can
> handle the client object-info request, the server now advertises the
> features it can send back so the client can immediately fallback to
> fetch if any feature is not supported
> - Fixed nits and clarified some documentation/commit messages
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
>
> Calvin Wan (6):
> fetch-pack: refactor packet writing
> fetch-pack: move fetch initialization
> protocol-caps: initialization bug fix
> serve: advertise object-info feature
> transport: add client support for object-info
> cat-file: add remote-object-info to batch-command
>
> Documentation/git-cat-file.txt | 16 +-
> builtin/cat-file.c | 225 ++++++++++++++++-----
> fetch-pack.c | 52 +++--
> fetch-pack.h | 10 +
> object-file.c | 11 ++
> object-store.h | 3 +
> protocol-caps.c | 2 +-
> serve.c | 10 +-
> t/t1006-cat-file.sh | 348 +++++++++++++++++++++++++++++++++
> t/t5555-http-smart-common.sh | 2 +-
> t/t5701-git-serve.sh | 2 +-
> transport-helper.c | 7 +-
> transport.c | 110 ++++++++++-
> transport.h | 11 ++
> 14 files changed, 734 insertions(+), 75 deletions(-)
>
>
> base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command
2022-07-31 8:41 ` Phillip Wood
@ 2022-08-04 22:57 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-04 22:57 UTC (permalink / raw)
To: phillip.wood; +Cc: git, gitster, jonathantanmy, philipoakley, johncai86
> This might be a daft question but could we teach cat-file to check if an
> object is available locally and if it is not retrieve just the metadata
> from the remote rather than adding a new command to --batch-command?
> That way it would be transparent to the user and they would not have to
> worry about whether they might be querying a remote object or not.
Definitely a good question! If we did this over a large amount of objects and
we didn't have those objects locally, then we would have to query for each
of them, resulting in a lot of transport overhead. Therefore, I needed to use
--batch-command to allow me to query all at once. This also begs the
question of why I don't check for the objects locally before querying the
server. The intention of my patch assumes that the user querying for
object-info knows they don't have the object locally and would therefore
like to see the size of an object they could potentially fetch. This could
definitely be potential future work (which I should mention in my commit
message!)
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command
2022-07-28 23:02 ` [PATCH v5 0/6] " Calvin Wan
2022-07-28 23:56 ` Junio C Hamano
2022-07-31 8:41 ` Phillip Wood
@ 2022-09-30 23:23 ` Junio C Hamano
2 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-09-30 23:23 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> Sometimes it is useful to get information about an object without having
> to download it completely. The server logic has already been implemented
> as “a2ba162cda (object-info: support for retrieving object info,
> 2021-04-20)”. This patch implements the client option for it.
>
> Add `--object-info` option to `cat-file --batch-command`. This option
> allows the client to make an object-info command request to a server
> that supports protocol v2. If the server is v2, but does not allow for
> the object-info command request, the entire object is fetched and the
> relevant object info is returned.
>
> === Changes since v4 ===
> - Instead of making 2 rounds trips to determine whether the server can
> handle the client object-info request, the server now advertises the
> features it can send back so the client can immediately fallback to
> fetch if any feature is not supported
> - Fixed nits and clarified some documentation/commit messages
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
FWIW, this one seems to make t1006 fail under SANITIZE=address, i.e.
$ SANITIZE=address make test
I thought it would be better to report this before you send out v6
after the release.
Thanks.
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v5 1/6] fetch-pack: refactor packet writing
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (8 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 0/6] " Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-28 23:02 ` [PATCH v5 2/6] fetch-pack: move fetch initialization Calvin Wan
` (6 subsequent siblings)
16 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
A subsequent patch will need to write capabilities for another command,
so refactor write_fetch_command_and_capabilities() to be used by both
fetch and future command.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
---
fetch-pack.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index cb6647d657..5176420b62 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1261,13 +1261,13 @@ static int add_haves(struct fetch_negotiator *negotiator,
return haves_added;
}
-static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options)
+static void write_command_and_capabilities(struct strbuf *req_buf,
+ const struct string_list *server_options, const char* command)
{
const char *hash_name;
- if (server_supports_v2("fetch", 1))
- packet_buf_write(req_buf, "command=fetch");
+ if (server_supports_v2(command, 1))
+ packet_buf_write(req_buf, "command=%s", command);
if (server_supports_v2("agent", 0))
packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
if (advertise_sid && server_supports_v2("session-id", 0))
@@ -1303,7 +1303,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
int done_sent = 0;
struct strbuf req_buf = STRBUF_INIT;
- write_fetch_command_and_capabilities(&req_buf, args->server_options);
+ write_command_and_capabilities(&req_buf, args->server_options, "fetch");
if (args->use_thin_pack)
packet_buf_write(&req_buf, "thin-pack");
@@ -2090,7 +2090,7 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
int received_ready = 0;
strbuf_reset(&req_buf);
- write_fetch_command_and_capabilities(&req_buf, server_options);
+ write_command_and_capabilities(&req_buf, server_options, "fetch");
packet_buf_write(&req_buf, "wait-for-done");
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply related [flat|nested] 103+ messages in thread
* [PATCH v5 2/6] fetch-pack: move fetch initialization
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (9 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 1/6] fetch-pack: refactor packet writing Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-28 23:02 ` [PATCH v5 3/6] protocol-caps: initialization bug fix Calvin Wan
` (5 subsequent siblings)
16 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
There are some variables that are initialized at the start of the
do_fetch_pack_v2() state machine. Currently, these are initialized
in FETCH_CHECK_LOCAL, which is the initial state set at the beginning
of the function.
However, a subsequent patch will allow for another initial state,
while still requiring these initialized variables. Therefore, move
the initialization to before the state machine, so that they are set
regardless of the initial state.
Note that there is no change in behavior, because we're moving code
from the beginning of the first state to just before the execution of
the state machine.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
---
fetch-pack.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 5176420b62..8c862b017e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1628,18 +1628,18 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
reader.me = "fetch-pack";
}
+ /* v2 supports these by default */
+ allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
+ use_sideband = 2;
+ if (args->depth > 0 || args->deepen_since || args->deepen_not)
+ args->deepen = 1;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
sort_ref_list(&ref, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
- /* v2 supports these by default */
- allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
- use_sideband = 2;
- if (args->depth > 0 || args->deepen_since || args->deepen_not)
- args->deepen = 1;
-
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply related [flat|nested] 103+ messages in thread
* [PATCH v5 3/6] protocol-caps: initialization bug fix
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (10 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 2/6] fetch-pack: move fetch initialization Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-29 17:51 ` Junio C Hamano
2022-07-28 23:02 ` [PATCH v5 4/6] serve: advertise object-info feature Calvin Wan
` (4 subsequent siblings)
16 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
Initialize info. If info.size bit was on due to on-stack garbage,
we would have given our response with "size" attribute prefixed,
even when the client side never requested it.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
---
protocol-caps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/protocol-caps.c b/protocol-caps.c
index bbde91810a..2ad9f45c59 100644
--- a/protocol-caps.c
+++ b/protocol-caps.c
@@ -77,7 +77,7 @@ static void send_info(struct repository *r, struct packet_writer *writer,
int cap_object_info(struct repository *r, struct packet_reader *request)
{
- struct requested_info info;
+ struct requested_info info = { 0 };
struct packet_writer writer;
struct string_list oid_str_list = STRING_LIST_INIT_DUP;
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v5 3/6] protocol-caps: initialization bug fix
2022-07-28 23:02 ` [PATCH v5 3/6] protocol-caps: initialization bug fix Calvin Wan
@ 2022-07-29 17:51 ` Junio C Hamano
0 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-07-29 17:51 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> Initialize info. If info.size bit was on due to on-stack garbage,
> we would have given our response with "size" attribute prefixed,
> even when the client side never requested it.
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> protocol-caps.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
OK, this is new in this round and it makes sense.
> diff --git a/protocol-caps.c b/protocol-caps.c
> index bbde91810a..2ad9f45c59 100644
> --- a/protocol-caps.c
> +++ b/protocol-caps.c
> @@ -77,7 +77,7 @@ static void send_info(struct repository *r, struct packet_writer *writer,
>
> int cap_object_info(struct repository *r, struct packet_reader *request)
> {
> - struct requested_info info;
> + struct requested_info info = { 0 };
> struct packet_writer writer;
> struct string_list oid_str_list = STRING_LIST_INIT_DUP;
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v5 4/6] serve: advertise object-info feature
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (11 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 3/6] protocol-caps: initialization bug fix Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-29 17:57 ` Junio C Hamano
2022-07-28 23:02 ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
` (3 subsequent siblings)
16 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
In order for a client to know what object-info components a server can
provide, advertise supported object-info features. This will allow a
client to decide whether to query the server for object-info or fetch as
a fallback.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
---
serve.c | 10 +++++++++-
t/t5555-http-smart-common.sh | 2 +-
t/t5701-git-serve.sh | 2 +-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/serve.c b/serve.c
index 733347f602..1adf9df4a8 100644
--- a/serve.c
+++ b/serve.c
@@ -56,6 +56,14 @@ static int session_id_advertise(struct repository *r, struct strbuf *value)
return 1;
}
+static int object_info_advertise(struct repository *r,
+ struct strbuf *value)
+{
+ if (value)
+ strbuf_addstr(value, "size");
+ return 1;
+}
+
static void session_id_receive(struct repository *r,
const char *client_sid)
{
@@ -132,7 +140,7 @@ static struct protocol_capability capabilities[] = {
},
{
.name = "object-info",
- .advertise = always_advertise,
+ .advertise = object_info_advertise,
.command = cap_object_info,
},
};
diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh
index b1cfe8b7db..5a16d4259a 100755
--- a/t/t5555-http-smart-common.sh
+++ b/t/t5555-http-smart-common.sh
@@ -131,7 +131,7 @@ test_expect_success 'git upload-pack --advertise-refs: v2' '
fetch=shallow wait-for-done
server-option
object-format=$(test_oid algo)
- object-info
+ object-info=size
0000
EOF
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 1896f671cb..ebb32644e3 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -20,7 +20,7 @@ test_expect_success 'test capability advertisement' '
fetch=shallow wait-for-done
server-option
object-format=$(test_oid algo)
- object-info
+ object-info=size
0000
EOF
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v5 4/6] serve: advertise object-info feature
2022-07-28 23:02 ` [PATCH v5 4/6] serve: advertise object-info feature Calvin Wan
@ 2022-07-29 17:57 ` Junio C Hamano
2022-08-01 18:28 ` Calvin Wan
0 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-07-29 17:57 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> In order for a client to know what object-info components a server can
> provide, advertise supported object-info features. This will allow a
> client to decide whether to query the server for object-info or fetch as
> a fallback.
OK. Now we will no longer advertise a bare "object-info", but
"object-info=size" (and possibly in the future things other than
"size"). How would this change affect older clients who knows what
to do with "object-info" but not "object-info=<values>" yet?
> +static int object_info_advertise(struct repository *r,
> + struct strbuf *value)
> +{
> + if (value)
> + strbuf_addstr(value, "size");
> + return 1;
> +}
> +
> static void session_id_receive(struct repository *r,
> const char *client_sid)
> {
> @@ -132,7 +140,7 @@ static struct protocol_capability capabilities[] = {
> },
> {
> .name = "object-info",
> - .advertise = always_advertise,
> + .advertise = object_info_advertise,
> .command = cap_object_info,
> },
> };
> diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh
> index b1cfe8b7db..5a16d4259a 100755
> --- a/t/t5555-http-smart-common.sh
> +++ b/t/t5555-http-smart-common.sh
> @@ -131,7 +131,7 @@ test_expect_success 'git upload-pack --advertise-refs: v2' '
> fetch=shallow wait-for-done
> server-option
> object-format=$(test_oid algo)
> - object-info
> + object-info=size
> 0000
> EOF
>
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index 1896f671cb..ebb32644e3 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -20,7 +20,7 @@ test_expect_success 'test capability advertisement' '
> fetch=shallow wait-for-done
> server-option
> object-format=$(test_oid algo)
> - object-info
> + object-info=size
> 0000
> EOF
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 4/6] serve: advertise object-info feature
2022-07-29 17:57 ` Junio C Hamano
@ 2022-08-01 18:28 ` Calvin Wan
2022-08-01 18:44 ` Ævar Arnfjörð Bjarmason
2022-08-01 18:47 ` Junio C Hamano
0 siblings, 2 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-01 18:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86
> OK. Now we will no longer advertise a bare "object-info", but
> "object-info=size" (and possibly in the future things other than
> "size"). How would this change affect older clients who knows what
> to do with "object-info" but not "object-info=<values>" yet?
This was a tricky tradeoff that I definitely think I should have
discussed more in the commit message. The issue with how object
info is currently implemented is that it is very inflexible for adding
new parameters.
This is how object-info currently parses a client request:
while (packet_reader_read(request) == PACKET_READ_NORMAL) {
if (!strcmp("size", request->line)) {
info.size = 1;
continue;
}
if (parse_oid(request->line, &oid_str_list))
continue;
packet_writer_error(&writer,
"object-info: unexpected line: '%s'",
request->line);
}
Object-info supports "size" right now but, let's say I want to add
"type" as a parameter. OK I add another if statement like:
if (!strcmp("type", request->line)) {
info.type = 1;
continue;
}
And we update the docs to say "type" is now supported by
object-info. If a user now attempts to request "size" and "type"
from a server that has not been updated to support "type",
then the user gets an error message "object-info: unexpected
line: 'type'", which is another situation that is a bad experience
for older clients. The client has no way of knowing that their
failure is caused by a server version issue.
Essentially I think at some point we have to bite the bullet and say
we need to rework some part of the object-info advertisement (or
if anyone has a better idea of achieving the same goal) so that we
can make future incremental changes to object-info. If the supported
parameters are posted in the advertisement, then the client doesn't
have to first make a request to find out that their requested
parameter isn't support by the server. While you noted that we can't
make the assumption now that nobody is using the current
object-info feature, I think the benefit of the change outweighs
the cost of affecting the possibly small amount of users of this
feature. (A quick search on stack overflow for "object-info" tagged
under [git] returned no questions about it so that's what I used as
a cursory estimate for how popular this feature is).
Curious to hear what your thoughts on this are Junio, since as
much as I'd like to create a seamless upgrade experience for
older clients, I'm out of ideas as to how I would do so.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 4/6] serve: advertise object-info feature
2022-08-01 18:28 ` Calvin Wan
@ 2022-08-01 18:44 ` Ævar Arnfjörð Bjarmason
2022-08-01 18:47 ` Junio C Hamano
1 sibling, 0 replies; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-01 18:44 UTC (permalink / raw)
To: Calvin Wan; +Cc: Junio C Hamano, git, jonathantanmy, philipoakley, johncai86
On Mon, Aug 01 2022, Calvin Wan wrote:
>> OK. Now we will no longer advertise a bare "object-info", but
>> "object-info=size" (and possibly in the future things other than
>> "size"). How would this change affect older clients who knows what
>> to do with "object-info" but not "object-info=<values>" yet?
>
> This was a tricky tradeoff that I definitely think I should have
> discussed more in the commit message. The issue with how object
> info is currently implemented is that it is very inflexible for adding
> new parameters.
>
> This is how object-info currently parses a client request:
>
> while (packet_reader_read(request) == PACKET_READ_NORMAL) {
> if (!strcmp("size", request->line)) {
> info.size = 1;
> continue;
> }
>
> if (parse_oid(request->line, &oid_str_list))
> continue;
>
> packet_writer_error(&writer,
> "object-info: unexpected line: '%s'",
> request->line);
> }
>
> Object-info supports "size" right now but, let's say I want to add
> "type" as a parameter. OK I add another if statement like:
>
> if (!strcmp("type", request->line)) {
> info.type = 1;
> continue;
> }
>
> And we update the docs to say "type" is now supported by
> object-info. If a user now attempts to request "size" and "type"
> from a server that has not been updated to support "type",
> then the user gets an error message "object-info: unexpected
> line: 'type'", which is another situation that is a bad experience
> for older clients. The client has no way of knowing that their
> failure is caused by a server version issue.
>
> Essentially I think at some point we have to bite the bullet and say
> we need to rework some part of the object-info advertisement (or
> if anyone has a better idea of achieving the same goal) so that we
> can make future incremental changes to object-info. If the supported
> parameters are posted in the advertisement, then the client doesn't
> have to first make a request to find out that their requested
> parameter isn't support by the server. While you noted that we can't
> make the assumption now that nobody is using the current
> object-info feature, I think the benefit of the change outweighs
> the cost of affecting the possibly small amount of users of this
> feature. (A quick search on stack overflow for "object-info" tagged
> under [git] returned no questions about it so that's what I used as
> a cursory estimate for how popular this feature is).
>
> Curious to hear what your thoughts on this are Junio, since as
> much as I'd like to create a seamless upgrade experience for
> older clients, I'm out of ideas as to how I would do so.
I haven't looked deeply into this case, but in general for such protcol
incompatibilities we could just create an object-info2, and have that
use some extensible calling convention we wish we'd have used from day
1.
Then have new clients understand both (and prefer the new verb), and
older clients use object-info without breakage.
Or we could call the new thing "cat-file", and have it accept any
arbitrary options it does, and then limit it to some sensible subset for
now :)
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 4/6] serve: advertise object-info feature
2022-08-01 18:28 ` Calvin Wan
2022-08-01 18:44 ` Ævar Arnfjörð Bjarmason
@ 2022-08-01 18:47 ` Junio C Hamano
2022-08-01 18:58 ` Calvin Wan
1 sibling, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-08-01 18:47 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> And we update the docs to say "type" is now supported by
> object-info. If a user now attempts to request "size" and "type"
> from a server that has not been updated to support "type",
> then the user gets an error message "object-info: unexpected
> line: 'type'", which is another situation that is a bad experience
> for older clients.
Is it? older clients by definition does not know about "type", no?
I am perfectly happy with the capability advertisement to say not
just "object-info", but also what attributes of objects you can be
queried, by the way. If clients right now (i.e. without this
series) expect that the other side who advertises "object-info"
without "=<these>,<attributes>,<are>,<supported>" supports only
"size", for example, perhaps treat the bare "object-info" capability
just as a synonym of "object-info=size", or something?
I do not deeply care either way, to be honest, as the deployed
clients with bare "object-info" will not survive for a long time
and will quickly be deprecated anyway. I just wanted to see the
reasoning behind the decision to ignore them.
Thanks.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 4/6] serve: advertise object-info feature
2022-08-01 18:47 ` Junio C Hamano
@ 2022-08-01 18:58 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-01 18:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86
> Is it? older clients by definition does not know about "type", no?
Newer client, older remote server was the situation I was thinking
about
> I am perfectly happy with the capability advertisement to say not
> just "object-info", but also what attributes of objects you can be
> queried, by the way. If clients right now (i.e. without this
> series) expect that the other side who advertises "object-info"
> without "=<these>,<attributes>,<are>,<supported>" supports only
> "size", for example, perhaps treat the bare "object-info" capability
> just as a synonym of "object-info=size", or something?
This is a good idea! Thanks
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v5 5/6] transport: add client support for object-info
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (12 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 4/6] serve: advertise object-info feature Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-29 18:06 ` Junio C Hamano
` (3 more replies)
2022-07-28 23:02 ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
` (2 subsequent siblings)
16 siblings, 4 replies; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
Sometimes it is useful to get information about an object without having
to download it completely. The server logic has already been implemented
as “a2ba162cda (object-info: support for retrieving object info,
2021-04-20)”. This patch adds client functions to communicate with the
server.
The client currently supports requesting a list of object ids with
features 'size' and 'type' from a v2 server. If a server does not
advertise either of the requested features, then the client falls back
to making the request through 'fetch'.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
---
fetch-pack.c | 28 ++++++++++++
fetch-pack.h | 10 +++++
transport-helper.c | 7 ++-
transport.c | 110 +++++++++++++++++++++++++++++++++++++++++++--
transport.h | 11 +++++
5 files changed, 161 insertions(+), 5 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 8c862b017e..d373aed775 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1293,6 +1293,31 @@ static void write_command_and_capabilities(struct strbuf *req_buf,
packet_buf_delim(req_buf);
}
+void send_object_info_request(int fd_out, struct object_info_args *args)
+{
+ struct strbuf req_buf = STRBUF_INIT;
+ size_t i;
+
+ write_command_and_capabilities(&req_buf, args->server_options, "object-info");
+
+ if (unsorted_string_list_has_string(args->object_info_options, "size"))
+ packet_buf_write(&req_buf, "size");
+
+ if (unsorted_string_list_has_string(args->object_info_options, "type"))
+ packet_buf_write(&req_buf, "type");
+
+ if (args->oids) {
+ for (i = 0; i < args->oids->nr; i++)
+ packet_buf_write(&req_buf, "oid %s", oid_to_hex(&args->oids->oid[i]));
+ }
+
+ packet_buf_flush(&req_buf);
+ if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
+ die_errno(_("unable to write request to remote"));
+
+ strbuf_release(&req_buf);
+}
+
static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
struct fetch_pack_args *args,
const struct ref *wants, struct oidset *common,
@@ -1634,6 +1659,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
if (args->depth > 0 || args->deepen_since || args->deepen_not)
args->deepen = 1;
+ if (args->object_info)
+ state = FETCH_SEND_REQUEST;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
diff --git a/fetch-pack.h b/fetch-pack.h
index 8c7752fc82..11c513f748 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -17,6 +17,7 @@ struct fetch_pack_args {
const struct string_list *deepen_not;
struct list_objects_filter_options filter_options;
const struct string_list *server_options;
+ struct object_info **object_info_data;
/*
* If not NULL, during packfile negotiation, fetch-pack will send "have"
@@ -43,6 +44,7 @@ struct fetch_pack_args {
unsigned reject_shallow_remote:1;
unsigned deepen:1;
unsigned refetch:1;
+ unsigned object_info:1;
/*
* Indicate that the remote of this request is a promisor remote. The
@@ -69,6 +71,12 @@ struct fetch_pack_args {
unsigned connectivity_checked:1;
};
+struct object_info_args {
+ struct string_list *object_info_options;
+ const struct string_list *server_options;
+ struct oid_array *oids;
+};
+
/*
* sought represents remote references that should be updated from.
* On return, the names that were found on the remote will have been
@@ -102,4 +110,6 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+void send_object_info_request(int fd_out, struct object_info_args *args);
+
#endif
diff --git a/transport-helper.c b/transport-helper.c
index 322c722478..48a6680200 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -686,13 +686,16 @@ static int fetch_refs(struct transport *transport,
/*
* If we reach here, then the server, the client, and/or the transport
- * helper does not support protocol v2. --negotiate-only requires
- * protocol v2.
+ * helper does not support protocol v2. --negotiate-only and --object-info
+ * require protocol v2.
*/
if (data->transport_options.acked_commits) {
warning(_("--negotiate-only requires protocol v2"));
return -1;
}
+ if (transport->smart_options->object_info) {
+ die(_("--object-info requires protocol v2"));
+ }
if (!data->get_refs_list_called)
get_refs_list_using_list(transport, 0);
diff --git a/transport.c b/transport.c
index 52db7a3cb0..2d503e2fbd 100644
--- a/transport.c
+++ b/transport.c
@@ -353,6 +353,80 @@ static struct ref *handshake(struct transport *transport, int for_push,
return refs;
}
+static int fetch_object_info(struct transport *transport, struct object_info **object_info_data)
+{
+ size_t i;
+ int size_index = -1;
+ struct git_transport_data *data = transport->data;
+ struct object_info_args args;
+ struct packet_reader reader;
+
+ memset(&args, 0, sizeof(args));
+ args.server_options = transport->server_options;
+ args.object_info_options = transport->smart_options->object_info_options;
+ args.oids = transport->smart_options->object_info_oids;
+
+ connect_setup(transport, 0);
+ packet_reader_init(&reader, data->fd[0], NULL, 0,
+ PACKET_READ_CHOMP_NEWLINE |
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_DIE_ON_ERR_PACKET);
+ data->version = discover_version(&reader);
+
+ transport->hash_algo = reader.hash_algo;
+
+ switch (data->version) {
+ case protocol_v2:
+ if (!server_supports_v2("object-info", 0))
+ return -1;
+ if (unsorted_string_list_has_string(args.object_info_options, "size")
+ && !server_supports_feature("object-info", "size", 0)) {
+ return -1;
+ }
+ if (unsorted_string_list_has_string(args.object_info_options, "type")
+ && !server_supports_feature("object-info", "type", 0)) {
+ return -1;
+ }
+ send_object_info_request(data->fd[1], &args);
+ break;
+ case protocol_v1:
+ case protocol_v0:
+ die(_("wrong protocol version. expected v2"));
+ case protocol_unknown_version:
+ BUG("unknown protocol version");
+ }
+
+ for (i = 0; i < args.object_info_options->nr; i++) {
+ if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
+ check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+ return -1;
+ }
+ if (unsorted_string_list_has_string(args.object_info_options, reader.line)) {
+ if (!strcmp(reader.line, "size"))
+ size_index = i;
+ continue;
+ }
+ return -1;
+ }
+
+ i = 0;
+ while (packet_reader_read(&reader) == PACKET_READ_NORMAL && i < args.oids->nr) {
+ struct string_list object_info_values = STRING_LIST_INIT_DUP;
+
+ string_list_split(&object_info_values, reader.line, ' ', -1);
+ if (0 <= size_index) {
+ if (!strcmp(object_info_values.items[1 + size_index].string, ""))
+ die("object-info: not our ref %s",
+ object_info_values.items[0].string);
+ *(*object_info_data)[i].sizep = strtoul(object_info_values.items[1 + size_index].string, NULL, 10);
+ }
+ i++;
+ }
+ check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
+
+ return 0;
+}
+
static struct ref *get_refs_via_connect(struct transport *transport, int for_push,
struct transport_ls_refs_options *options)
{
@@ -363,10 +437,12 @@ static int fetch_refs_via_pack(struct transport *transport,
int nr_heads, struct ref **to_fetch)
{
int ret = 0;
+ size_t i;
struct git_transport_data *data = transport->data;
struct ref *refs = NULL;
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
+ struct ref *object_info_refs = xcalloc(1, sizeof (struct ref));
memset(&args, 0, sizeof(args));
args.uploadpack = data->options.uploadpack;
@@ -392,8 +468,25 @@ static int fetch_refs_via_pack(struct transport *transport,
args.server_options = transport->server_options;
args.negotiation_tips = data->options.negotiation_tips;
args.reject_shallow_remote = transport->smart_options->reject_shallow;
-
- if (!data->got_remote_heads) {
+ args.object_info = transport->smart_options->object_info;
+
+ if (transport->smart_options && transport->smart_options->object_info) {
+ struct ref *ref = object_info_refs;
+
+ if (!fetch_object_info(transport, data->options.object_info_data))
+ goto cleanup;
+ args.object_info_data = data->options.object_info_data;
+ args.quiet = 1;
+ args.no_progress = 1;
+ for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
+ struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
+ temp_ref->old_oid = *(transport->smart_options->object_info_oids->oid + i);
+ temp_ref->exact_oid = 1;
+ ref->next = temp_ref;
+ ref = ref->next;
+ }
+ transport->remote_refs = object_info_refs->next;
+ } else if (!data->got_remote_heads) {
int i;
int must_list_refs = 0;
for (i = 0; i < nr_heads; i++) {
@@ -433,12 +526,22 @@ static int fetch_refs_via_pack(struct transport *transport,
to_fetch, nr_heads, &data->shallow,
&transport->pack_lockfiles, data->version);
+ if (args.object_info) {
+ struct ref *ref_cpy_reader = object_info_refs->next;
+ int i = 0;
+ while (ref_cpy_reader) {
+ oid_object_info_extended(the_repository, &ref_cpy_reader->old_oid, &(*args.object_info_data)[i], OBJECT_INFO_LOOKUP_REPLACE);
+ ref_cpy_reader = ref_cpy_reader->next;
+ i++;
+ }
+ }
+
data->got_remote_heads = 0;
data->options.self_contained_and_connected =
args.self_contained_and_connected;
data->options.connectivity_checked = args.connectivity_checked;
- if (!refs)
+ if (refs == NULL && !args.object_info)
ret = -1;
if (report_unmatched_refs(to_fetch, nr_heads))
ret = -1;
@@ -453,6 +556,7 @@ static int fetch_refs_via_pack(struct transport *transport,
free_refs(refs_tmp);
free_refs(refs);
+ free_refs(object_info_refs);
return ret;
}
diff --git a/transport.h b/transport.h
index b5bf7b3e70..5512fdb140 100644
--- a/transport.h
+++ b/transport.h
@@ -6,6 +6,7 @@
#include "remote.h"
#include "list-objects-filter-options.h"
#include "string-list.h"
+#include "object-store.h"
struct git_transport_options {
unsigned thin : 1;
@@ -31,6 +32,12 @@ struct git_transport_options {
*/
unsigned connectivity_checked:1;
+ /*
+ * Transport will attempt to pull only object-info. Fallbacks
+ * to pulling entire object if object-info is not supported
+ */
+ unsigned object_info : 1;
+
int depth;
const char *deepen_since;
const struct string_list *deepen_not;
@@ -54,6 +61,10 @@ struct git_transport_options {
* common commits to this oidset instead of fetching any packfiles.
*/
struct oidset *acked_commits;
+
+ struct oid_array *object_info_oids;
+ struct object_info **object_info_data;
+ struct string_list *object_info_options;
};
enum transport_family {
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-07-28 23:02 ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
@ 2022-07-29 18:06 ` Junio C Hamano
2022-08-04 20:28 ` Calvin Wan
2022-08-01 13:30 ` Phillip Wood
` (2 subsequent siblings)
3 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-07-29 18:06 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> +void send_object_info_request(int fd_out, struct object_info_args *args)
> +{
> + struct strbuf req_buf = STRBUF_INIT;
> + size_t i;
> +
> + write_command_and_capabilities(&req_buf, args->server_options, "object-info");
> +
> + if (unsorted_string_list_has_string(args->object_info_options, "size"))
> + packet_buf_write(&req_buf, "size");
> +
> + if (unsorted_string_list_has_string(args->object_info_options, "type"))
> + packet_buf_write(&req_buf, "type");
> +
> + if (args->oids) {
> + for (i = 0; i < args->oids->nr; i++)
> + packet_buf_write(&req_buf, "oid %s", oid_to_hex(&args->oids->oid[i]));
> + }
If !args->oids then we say "we want to request object-info to learn
size and type for the following objects: oh, there are no objects we
are interested in". I wonder if an early return
if (!args->oids)
return;
at the beginning of the function that turns it into a benign no-op,
may make more sense? Calling "send_X()" helper and seeing nothing
come out on the wire might make it look awkward, though.
> @@ -363,10 +437,12 @@ static int fetch_refs_via_pack(struct transport *transport,
> int nr_heads, struct ref **to_fetch)
> {
> int ret = 0;
> + size_t i;
> struct git_transport_data *data = transport->data;
> struct ref *refs = NULL;
> struct fetch_pack_args args;
> struct ref *refs_tmp = NULL;
> + struct ref *object_info_refs = xcalloc(1, sizeof (struct ref));
Style: no SP between "sizeof" and "(".
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-07-29 18:06 ` Junio C Hamano
@ 2022-08-04 20:28 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-04 20:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86
> If !args->oids then we say "we want to request object-info to learn
> size and type for the following objects: oh, there are no objects we
> are interested in". I wonder if an early return
>
> if (!args->oids)
> return;
>
> at the beginning of the function that turns it into a benign no-op,
> may make more sense? Calling "send_X()" helper and seeing nothing
> come out on the wire might make it look awkward, though.
If I add this change, then I should also add a check to ensure
args->object_info_options is also not empty. The question I think we
should answer is whether checking for oids and options should
happen inside this function or not. I'm leaning towards outside
the function for the reason you stated above about nothing
coming out of the wire being awkward.
> > + struct ref *object_info_refs = xcalloc(1, sizeof (struct ref));
>
> Style: no SP between "sizeof" and "(".
ack
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-07-28 23:02 ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
2022-07-29 18:06 ` Junio C Hamano
@ 2022-08-01 13:30 ` Phillip Wood
2022-08-04 22:20 ` Calvin Wan
2022-08-01 14:26 ` Phillip Wood
2022-08-08 9:16 ` Phillip Wood
3 siblings, 1 reply; 103+ messages in thread
From: Phillip Wood @ 2022-08-01 13:30 UTC (permalink / raw)
To: Calvin Wan, git; +Cc: gitster, jonathantanmy, philipoakley, johncai86
Hi Calvin
On 29/07/2022 00:02, Calvin Wan wrote:
> Sometimes it is useful to get information about an object without having
> to download it completely. The server logic has already been implemented
> as “a2ba162cda (object-info: support for retrieving object info,
> 2021-04-20)”. This patch adds client functions to communicate with the
> server.
>
> The client currently supports requesting a list of object ids with
> features 'size' and 'type' from a v2 server. If a server does not
> advertise either of the requested features, then the client falls back
> to making the request through 'fetch'.
I'm confused by the 'type' support, I might have missed something as I'm
not familiar with this code but I couldn't see where we parse the type
returned by the server.
> + for (i = 0; i < args.object_info_options->nr; i++) {
> + if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
> + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
This is one of a number of lines in this series that are way over the 80
column limit.
> + return -1;
> + }
> + if (unsorted_string_list_has_string(args.object_info_options, reader.line)) {
> + if (!strcmp(reader.line, "size"))
> + size_index = i;
Should we be checking for "type" as well? Also does protocol-v2.txt need
updating as it only mentions "size" as an attribute.
> + continue;
> + }
> + return -1;
> + }
> +
> + i = 0;
> + while (packet_reader_read(&reader) == PACKET_READ_NORMAL && i < args.oids->nr) {
> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
> +
> + string_list_split(&object_info_values, reader.line, ' ', -1);
> + if (0 <= size_index) {
To avoid a possible out-of-bounds access we need to check that
size_index + 1 < object_info_value.nr in case the server response is
malformed
> + if (!strcmp(object_info_values.items[1 + size_index].string, ""))
> + die("object-info: not our ref %s",
I'm a bit confused by this message is it trying to say "object %s is
missing on the server"?
> + object_info_values.items[0].string);
> + *(*object_info_data)[i].sizep = strtoul(object_info_values.items[1 + size_index].string, NULL, 10);
As Junio pointed out in his comments in v4 there is no error checking
here - we should check the server has actually returned a number. Note
that strtoul() will happily parse negative numbers so we probably want
to do something like
const char *endp
errno = 0
if (!isdigit(*object_info_values.items[1 + size_index].string))
die("...")
*(*object_info_data)[i].sizep = strtoul(object_info_values.items[1 +
size_index].string, &endp, 10);
if (errno || *endp)
die("...")
Should be we checking the object id matches what we asked for? (I'm not
sure if protocol-v2.txt mentions the order in which the objects are
returned)
Should we be parsing the object type here as well?
> @@ -392,8 +468,25 @@ static int fetch_refs_via_pack(struct transport *transport,
> args.server_options = transport->server_options;
> args.negotiation_tips = data->options.negotiation_tips;
> args.reject_shallow_remote = transport->smart_options->reject_shallow;
> -
> - if (!data->got_remote_heads) {
> + args.object_info = transport->smart_options->object_info;
> +
> + if (transport->smart_options && transport->smart_options->object_info) {
> + struct ref *ref = object_info_refs;
> +
> + if (!fetch_object_info(transport, data->options.object_info_data))
> + goto cleanup;
> + args.object_info_data = data->options.object_info_data;
> + args.quiet = 1;
> + args.no_progress = 1;
> + for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
> + struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
Using CALLOC_ARRAY() or p = xcalloc(1, sizeof(*p)) would be safer here
(and everywhere else where you have used xcalloc()) as it ensures we
allocate the correct size.
> diff --git a/transport.h b/transport.h
> index b5bf7b3e70..5512fdb140 100644
> --- a/transport.h
> +++ b/transport.h
> @@ -31,6 +32,12 @@ struct git_transport_options {
> */
> unsigned connectivity_checked:1;
>
> + /*
> + * Transport will attempt to pull only object-info. Fallbacks
> + * to pulling entire object if object-info is not supported
> + */
Is it definitely true that we fallback to pulling the entire object? -
there is at least one place above where we do
> + if (transport->smart_options->object_info) {
> + die(_("--object-info requires protocol v2"));
> + }
>
Best Wishes
Phillip
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-08-01 13:30 ` Phillip Wood
@ 2022-08-04 22:20 ` Calvin Wan
2022-08-08 10:07 ` Phillip Wood
0 siblings, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-08-04 22:20 UTC (permalink / raw)
To: phillip.wood; +Cc: git, gitster, jonathantanmy, philipoakley, johncai86
> > The client currently supports requesting a list of object ids with
> > features 'size' and 'type' from a v2 server. If a server does not
> > advertise either of the requested features, then the client falls back
> > to making the request through 'fetch'.
>
> I'm confused by the 'type' support, I might have missed something as I'm
> not familiar with this code but I couldn't see where we parse the type
> returned by the server.
I should clarify that the server does not support 'type', only the client.
Since the client falls back to 'fetch' to grab the object info if the server
doesn't support a requested option (e.g. type), I have 'type' included
as part of the supported client options.
> > + for (i = 0; i < args.object_info_options->nr; i++) {
> > + if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
> > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
>
> This is one of a number of lines in this series that are way over the 80
> column limit.
ack
> > + if (unsorted_string_list_has_string(args.object_info_options, reader.line)) {
> > + if (!strcmp(reader.line, "size"))
> > + size_index = i;
>
> Should we be checking for "type" as well? Also does protocol-v2.txt need
> updating as it only mentions "size" as an attribute.
I gave context above -- the server does not accept 'type' so protocol-v2.txt
does not need to be updated.
> To avoid a possible out-of-bounds access we need to check that
> size_index + 1 < object_info_value.nr in case the server response is
> malformed
ack
> > + if (!strcmp(object_info_values.items[1 + size_index].string, ""))
> > + die("object-info: not our ref %s",
>
> I'm a bit confused by this message is it trying to say "object %s is
> missing on the server"?
Correct. You'll find the same error message in upload-pack.c
> > + object_info_values.items[0].string);
> > + *(*object_info_data)[i].sizep = strtoul(object_info_values.items[1 + size_index].string, NULL, 10);
>
> As Junio pointed out in his comments in v4 there is no error checking
> here - we should check the server has actually returned a number. Note
> that strtoul() will happily parse negative numbers so we probably want
> to do something like
>
> const char *endp
> errno = 0
> if (!isdigit(*object_info_values.items[1 + size_index].string))
> die("...")
> *(*object_info_data)[i].sizep = strtoul(object_info_values.items[1 +
> size_index].string, &endp, 10);
> if (errno || *endp)
> die("...")
> Should be we checking the object id matches what we asked for? (I'm not
> sure if protocol-v2.txt mentions the order in which the objects are
> returned)
Hmmmm I think I either check for an object id match or update
protocol-v2.txt to mention order is consistent.
> Should we be parsing the object type here as well?
When the server starts supporting it.
> > + for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
> > + struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
>
> Using CALLOC_ARRAY() or p = xcalloc(1, sizeof(*p)) would be safer here
> (and everywhere else where you have used xcalloc()) as it ensures we
> allocate the correct size.
ack
> > + /*
> > + * Transport will attempt to pull only object-info. Fallbacks
> > + * to pulling entire object if object-info is not supported
> > + */
>
> Is it definitely true that we fallback to pulling the entire object? -
> there is at least one place above where we do
Yes, fetch_refs_via_pack() is where fetch_object_info() is called,
making it easy to fallback if fetch_object_info() fails.
>> + while (packet_reader_read(&reader) == PACKET_READ_NORMAL && i < args.oids->nr) {
>> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
>
> I forget to say earlier that this is leaked
ack
Thank you for the review!
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-08-04 22:20 ` Calvin Wan
@ 2022-08-08 10:07 ` Phillip Wood
0 siblings, 0 replies; 103+ messages in thread
From: Phillip Wood @ 2022-08-08 10:07 UTC (permalink / raw)
To: Calvin Wan, phillip.wood
Cc: git, gitster, jonathantanmy, philipoakley, johncai86
Hi Calvin
On 04/08/2022 23:20, Calvin Wan wrote:
>>> The client currently supports requesting a list of object ids with
>>> features 'size' and 'type' from a v2 server. If a server does not
>>> advertise either of the requested features, then the client falls back
>>> to making the request through 'fetch'.
>>
>> I'm confused by the 'type' support, I might have missed something as I'm
>> not familiar with this code but I couldn't see where we parse the type
>> returned by the server.
>
> I should clarify that the server does not support 'type', only the client.
> Since the client falls back to 'fetch' to grab the object info if the server
> doesn't support a requested option (e.g. type), I have 'type' included
> as part of the supported client options.
Does that mean all the type client code is effectively unused and so
untested?
>>> + if (!strcmp(object_info_values.items[1 + size_index].string, ""))
>>> + die("object-info: not our ref %s",
>>
>> I'm a bit confused by this message is it trying to say "object %s is
>> missing on the server"?
>
> Correct. You'll find the same error message in upload-pack.c
I find the message confusing as we're talking about oids not refs. I
have also realized that dying here is incompatible with the normal
cat-file behavior of printing "<objectname> missing" rather than dying
when a missing object is queried.
>> Should be we checking the object id matches what we asked for? (I'm not
>> sure if protocol-v2.txt mentions the order in which the objects are
>> returned)
>
> Hmmmm I think I either check for an object id match or update
> protocol-v2.txt to mention order is consistent.
If we can then updating the protocol makes sense. Even if we do that a
buggy or malicious server could return the objects in a different order.
A malicious server can also lie about the size so I'm not sure how much
benefit there is in checking the oids in terms of validating the
response but I think we should definitely check that the server returns
the expected number of sizes.
Best Wishes
Phillip
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-07-28 23:02 ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
2022-07-29 18:06 ` Junio C Hamano
2022-08-01 13:30 ` Phillip Wood
@ 2022-08-01 14:26 ` Phillip Wood
2022-08-08 9:16 ` Phillip Wood
3 siblings, 0 replies; 103+ messages in thread
From: Phillip Wood @ 2022-08-01 14:26 UTC (permalink / raw)
To: Calvin Wan, git; +Cc: gitster, jonathantanmy, philipoakley, johncai86
Hi Calvin
On 29/07/2022 00:02, Calvin Wan wrote:
> diff --git a/transport.c b/transport.c
> index 52db7a3cb0..2d503e2fbd 100644
> --- a/transport.c
> +++ b/transport.c
> + i = 0;
> + while (packet_reader_read(&reader) == PACKET_READ_NORMAL && i < args.oids->nr) {
> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
I forget to say earlier that this is leaked
Best Wishes
Phillip
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 5/6] transport: add client support for object-info
2022-07-28 23:02 ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
` (2 preceding siblings ...)
2022-08-01 14:26 ` Phillip Wood
@ 2022-08-08 9:16 ` Phillip Wood
3 siblings, 0 replies; 103+ messages in thread
From: Phillip Wood @ 2022-08-08 9:16 UTC (permalink / raw)
To: Calvin Wan, git; +Cc: gitster, jonathantanmy, philipoakley, johncai86
Hi Calvin
On 29/07/2022 00:02, Calvin Wan wrote:
> diff --git a/transport.c b/transport.c
> index 52db7a3cb0..2d503e2fbd 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -363,10 +437,12 @@ static int fetch_refs_via_pack(struct transport *transport,
> int nr_heads, struct ref **to_fetch)
> {
> int ret = 0;
> + size_t i;
> struct git_transport_data *data = transport->data;
> struct ref *refs = NULL;
> struct fetch_pack_args args;
> struct ref *refs_tmp = NULL;
> + struct ref *object_info_refs = xcalloc(1, sizeof (struct ref));
When git is compiled with SANITIZE=address then one of the cat-file
tests added in patch 6 fails with an out of bounds access. The problem
is that the last member of struct ref is a flexible array that is
expected to contain a NUL terminated string but we don't allocate any
memory for the string here. We could just add one to the size of the
allocation but as there is a dedicated allocation function it would be
better to use alloc_ref("").
I think it would also be worth delaying this allocation until we're sure
it is going to be needed.
> memset(&args, 0, sizeof(args));
> args.uploadpack = data->options.uploadpack;
> @@ -392,8 +468,25 @@ static int fetch_refs_via_pack(struct transport *transport,
> args.server_options = transport->server_options;
> args.negotiation_tips = data->options.negotiation_tips;
> args.reject_shallow_remote = transport->smart_options->reject_shallow;
> -
> - if (!data->got_remote_heads) {
> + args.object_info = transport->smart_options->object_info;
> +
> + if (transport->smart_options && transport->smart_options->object_info) {
> + struct ref *ref = object_info_refs;
> +
> + if (!fetch_object_info(transport, data->options.object_info_data))
> + goto cleanup;
If we allocate object_info_refs and initialize ref here then we avoid
allocating it when it is not needed.
> + args.object_info_data = data->options.object_info_data;
> + args.quiet = 1;
> + args.no_progress = 1;
> + for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
> + struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
This needs to use alloc_ref("") as well
> + temp_ref->old_oid = *(transport->smart_options->object_info_oids->oid + i);
This would be clearer as ...oids->oid[i] I think
Best Wishes
Phillip
^ permalink raw reply [flat|nested] 103+ messages in thread
* [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (13 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
@ 2022-07-28 23:02 ` Calvin Wan
2022-07-29 6:25 ` Ævar Arnfjörð Bjarmason
` (2 more replies)
2022-07-31 15:02 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command ZheNing Hu
2022-08-13 22:17 ` Junio C Hamano
16 siblings, 3 replies; 103+ messages in thread
From: Calvin Wan @ 2022-07-28 23:02 UTC (permalink / raw)
To: git; +Cc: Calvin Wan, gitster, jonathantanmy, philipoakley, johncai86
Since the `info` command in cat-file --batch-command prints object info
for a given object, it is natural to add another command in cat-file
--batch-command to print object info for a given object from a remote.
Add `remote-object-info` to cat-file --batch-command.
While `info` takes object ids one at a time, this creates overhead when
making requests to a server so `remote-object-info` instead can take
multiple object ids at once.
cat-file --batch-command is generally implemented in the following
manner:
- Receive and parse input from user
- Call respective function attached to command
- Set batch mode state, get object info, print object info
In --buffer mode, this changes to:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue
- Call respective function attached to command
- Set batch mode state, get object info, print object info
Notice how the getting and printing of object info is accomplished one
at a time. As described above, this creates a problem for making
requests to a server. Therefore, `remote-object-info` is implemented in
the following manner:
- Receive and parse input from user
If command is `remote-object-info`:
- Get object info from remote
- Loop through object info
- Call respective function attached to `info`
- Set batch mode state, use passed in object info, print object
info
Else:
- Call respective function attached to command
- Parse input, get object info, print object info
And finally for --buffer mode `remote-object-info`:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue:
If command is `remote-object-info`:
- Get object info from remote
- Loop through object info
- Call respective function attached to `info`
- Set batch mode state, use passed in object info, print
object info
Else:
- Call respective function attached to command
- Set batch mode state, get object info, print object info
To summarize, `remote-object-info` gets object info from the remote and
then generates multiple `info` commands with the object info passed in.
In order for remote-object-info to avoid remote communication overhead
in the non-buffer mode, the objects are passed in as such:
remote-object-info <remote> <oid> <oid> ... <oid>
rather than
remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
...
remote-object-info <remote> <oid>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
---
Documentation/git-cat-file.txt | 16 +-
builtin/cat-file.c | 225 ++++++++++++++++-----
object-file.c | 11 ++
object-store.h | 3 +
t/t1006-cat-file.sh | 348 +++++++++++++++++++++++++++++++++
5 files changed, 549 insertions(+), 54 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..c073d5e50d 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -115,6 +115,10 @@ info <object>::
Print object info for object reference `<object>`. This corresponds to the
output of `--batch-check`.
+remote-object-info <remote> [<object>]::
+ Print object info for object references `<object>` at specified <remote>.
+ This command may be combined with `--buffer`.
+
flush::
Used with `--buffer` to execute all preceding commands that were issued
since the beginning or since the last flush was issued. When `--buffer`
@@ -245,7 +249,8 @@ newline. The available atoms are:
The full hex representation of the object name.
`objecttype`::
- The type of the object (the same as `cat-file -t` reports).
+ The type of the object (the same as `cat-file -t` reports). See
+ `CAVEATS` below.
`objectsize`::
The size, in bytes, of the object (the same as `cat-file -s`
@@ -253,13 +258,14 @@ newline. The available atoms are:
`objectsize:disk`::
The size, in bytes, that the object takes up on disk. See the
- note about on-disk sizes in the `CAVEATS` section below.
+ note about on-disk sizes in the `CAVEATS` section below. Not
+ supported by `remote-object-info`.
`deltabase`::
If the object is stored as a delta on-disk, this expands to the
full hex representation of the delta base object name.
Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
- below.
+ below. Not supported by `remote-object-info`.
`rest`::
If this atom is used in the output string, input lines are split
@@ -346,6 +352,10 @@ directory name.
CAVEATS
-------
+Note that since object type is currently not supported by the
+object-info command request, git fetches the entire object instead to
+get the object info.
+
Note that the sizes of objects on disk are reported accurately, but care
should be taken in drawing conclusions about which refs or objects are
responsible for disk usage. The size of a packed non-delta object may be
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..68b72377e6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,9 @@
#include "packfile.h"
#include "object-store.h"
#include "promisor-remote.h"
+#include "alias.h"
+#include "remote.h"
+#include "transport.h"
enum batch_mode {
BATCH_MODE_CONTENTS,
@@ -32,9 +35,14 @@ struct batch_options {
int unordered;
int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
const char *format;
+ int use_remote_info;
};
+#define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
+
static const char *force_path;
+static struct object_info *remote_object_info;
+static struct oid_array object_info_oids = OID_ARRAY_INIT;
static int filter_object(const char *path, unsigned mode,
const struct object_id *oid,
@@ -423,48 +431,115 @@ static void batch_one_object(const char *obj_name,
int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
enum get_oid_result result;
- result = get_oid_with_context(the_repository, obj_name,
- flags, &data->oid, &ctx);
- if (result != FOUND) {
- switch (result) {
- case MISSING_OBJECT:
- printf("%s missing\n", obj_name);
- break;
- case SHORT_NAME_AMBIGUOUS:
- printf("%s ambiguous\n", obj_name);
- break;
- case DANGLING_SYMLINK:
- printf("dangling %"PRIuMAX"\n%s\n",
- (uintmax_t)strlen(obj_name), obj_name);
- break;
- case SYMLINK_LOOP:
- printf("loop %"PRIuMAX"\n%s\n",
- (uintmax_t)strlen(obj_name), obj_name);
- break;
- case NOT_DIR:
- printf("notdir %"PRIuMAX"\n%s\n",
- (uintmax_t)strlen(obj_name), obj_name);
- break;
- default:
- BUG("unknown get_sha1_with_context result %d\n",
- result);
- break;
+ if (!opt->use_remote_info) {
+ result = get_oid_with_context(the_repository, obj_name,
+ flags, &data->oid, &ctx);
+ if (result != FOUND) {
+ switch (result) {
+ case MISSING_OBJECT:
+ printf("%s missing\n", obj_name);
+ break;
+ case SHORT_NAME_AMBIGUOUS:
+ printf("%s ambiguous\n", obj_name);
+ break;
+ case DANGLING_SYMLINK:
+ printf("dangling %"PRIuMAX"\n%s\n",
+ (uintmax_t)strlen(obj_name), obj_name);
+ break;
+ case SYMLINK_LOOP:
+ printf("loop %"PRIuMAX"\n%s\n",
+ (uintmax_t)strlen(obj_name), obj_name);
+ break;
+ case NOT_DIR:
+ printf("notdir %"PRIuMAX"\n%s\n",
+ (uintmax_t)strlen(obj_name), obj_name);
+ break;
+ default:
+ BUG("unknown get_sha1_with_context result %d\n",
+ result);
+ break;
+ }
+ fflush(stdout);
+ return;
}
- fflush(stdout);
- return;
- }
- if (ctx.mode == 0) {
- printf("symlink %"PRIuMAX"\n%s\n",
- (uintmax_t)ctx.symlink_path.len,
- ctx.symlink_path.buf);
- fflush(stdout);
- return;
+ if (ctx.mode == 0) {
+ printf("symlink %"PRIuMAX"\n%s\n",
+ (uintmax_t)ctx.symlink_path.len,
+ ctx.symlink_path.buf);
+ fflush(stdout);
+ return;
+ }
}
batch_object_write(obj_name, scratch, opt, data, NULL, 0);
}
+static int get_remote_info(struct batch_options *opt, int argc, const char **argv)
+{
+ int retval = 0;
+ size_t i;
+ struct remote *remote = NULL;
+ struct object_id oid;
+ struct string_list object_info_options = STRING_LIST_INIT_NODUP;
+ static struct transport *gtransport;
+ char *format = DEFAULT_FORMAT;
+
+ if (opt->format)
+ format = xstrdup(opt->format);
+
+ remote = remote_get(argv[0]);
+ if (!remote)
+ die(_("must supply valid remote when using --object-info"));
+ oid_array_clear(&object_info_oids);
+ for (i = 1; i < argc; i++) {
+ if (get_oid(argv[i], &oid))
+ die(_("malformed object id '%s'"), argv[i]);
+ oid_array_append(&object_info_oids, &oid);
+ }
+
+ gtransport = transport_get(remote, NULL);
+ if (gtransport->smart_options) {
+ size_t j;
+ int include_size = 0, include_type = 0;
+
+ remote_object_info = xcalloc(object_info_oids.nr, sizeof(struct object_info));
+ gtransport->smart_options->object_info = 1;
+ gtransport->smart_options->object_info_oids = &object_info_oids;
+ /**
+ * 'size' is the only option currently supported.
+ * Other options that are passed in the format will fallback to a
+ * standard fetch request rather than object-info.
+ */
+ if (strstr(format, "%(objectsize)")) {
+ string_list_append(&object_info_options, "size");
+ include_size = 1;
+ }
+ if (strstr(format, "%(objecttype)")) {
+ string_list_append(&object_info_options, "type");
+ include_type = 1;
+ }
+ if (strstr(format, "%(objectsize:disk)"))
+ die(_("objectsize:disk is currently not supported with remote-object-info"));
+ if (strstr(format, "%(deltabase)"))
+ die(_("deltabase is currently not supported with remote-object-info"));
+ if (object_info_options.nr > 0) {
+ gtransport->smart_options->object_info_options = &object_info_options;
+ for (j = 0; j < object_info_oids.nr; j++) {
+ if (include_size)
+ remote_object_info[j].sizep = xcalloc(1, sizeof(long));
+ if (include_type)
+ remote_object_info[j].typep = xcalloc(1, sizeof(enum object_type));
+ }
+ gtransport->smart_options->object_info_data = &remote_object_info;
+ retval = transport_fetch_refs(gtransport, NULL);
+ }
+ } else {
+ retval = -1;
+ }
+ return retval;
+}
+
struct object_cb_data {
struct batch_options *opt;
struct expand_data *expand;
@@ -536,6 +611,7 @@ typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
struct queued_cmd {
parse_cmd_fn_t fn;
char *line;
+ const char *name;
};
static void parse_cmd_contents(struct batch_options *opt,
@@ -556,6 +632,56 @@ static void parse_cmd_info(struct batch_options *opt,
batch_one_object(line, output, opt, data);
}
+static const struct parse_cmd {
+ const char *name;
+ parse_cmd_fn_t fn;
+ unsigned takes_args;
+} commands[] = {
+ { "contents", parse_cmd_contents, 1},
+ { "info", parse_cmd_info, 1},
+ { "remote-object-info", parse_cmd_info, 1},
+ { "flush", NULL, 0},
+};
+
+static void parse_remote_info(struct batch_options *opt,
+ char *line,
+ struct strbuf *output,
+ struct expand_data *data,
+ const struct parse_cmd *p_cmd,
+ struct queued_cmd *q_cmd)
+{
+ int count;
+ size_t i;
+ const char **argv;
+
+ count = split_cmdline(line, &argv);
+ if (get_remote_info(opt, count, argv))
+ goto cleanup;
+ opt->use_remote_info = 1;
+ data->skip_object_info = 1;
+ data->mark_query = 0;
+ for (i = 0; i < object_info_oids.nr; i++) {
+ if (remote_object_info[i].sizep)
+ data->size = *remote_object_info[i].sizep;
+ if (remote_object_info[i].typep)
+ data->type = *remote_object_info[i].typep;
+
+ data->oid = object_info_oids.oid[i];
+ if (p_cmd)
+ p_cmd->fn(opt, argv[i+1], output, data);
+ else
+ q_cmd->fn(opt, argv[i+1], output, data);
+ }
+ opt->use_remote_info = 0;
+ data->skip_object_info = 0;
+ data->mark_query = 1;
+
+cleanup:
+ for (i = 0; i < object_info_oids.nr; i++)
+ free_object_info_contents(&remote_object_info[i]);
+ free(remote_object_info);
+}
+
static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
@@ -567,8 +693,12 @@ static void dispatch_calls(struct batch_options *opt,
if (!opt->buffer_output)
die(_("flush is only for --buffer mode"));
- for (i = 0; i < nr; i++)
- cmd[i].fn(opt, cmd[i].line, output, data);
+ for (i = 0; i < nr; i++) {
+ if (!strcmp(cmd[i].name, "remote-object-info"))
+ parse_remote_info(opt, cmd[i].line, output, data, NULL, &cmd[i]);
+ else
+ cmd[i].fn(opt, cmd[i].line, output, data);
+ }
fflush(stdout);
}
@@ -583,17 +713,6 @@ static void free_cmds(struct queued_cmd *cmd, size_t *nr)
*nr = 0;
}
-
-static const struct parse_cmd {
- const char *name;
- parse_cmd_fn_t fn;
- unsigned takes_args;
-} commands[] = {
- { "contents", parse_cmd_contents, 1},
- { "info", parse_cmd_info, 1},
- { "flush", NULL, 0},
-};
-
static void batch_objects_command(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data)
@@ -639,11 +758,17 @@ static void batch_objects_command(struct batch_options *opt,
dispatch_calls(opt, output, data, queued_cmd, nr);
free_cmds(queued_cmd, &nr);
} else if (!opt->buffer_output) {
- cmd->fn(opt, p, output, data);
+ if (!strcmp(cmd->name, "remote-object-info")) {
+ char *line = xstrdup_or_null(p);
+ parse_remote_info(opt, line, output, data, cmd, NULL);
+ } else {
+ cmd->fn(opt, p, output, data);
+ }
} else {
ALLOC_GROW(queued_cmd, nr + 1, alloc);
call.fn = cmd->fn;
call.line = xstrdup_or_null(p);
+ call.name = cmd->name;
queued_cmd[nr++] = call;
}
}
@@ -659,8 +784,6 @@ static void batch_objects_command(struct batch_options *opt,
strbuf_release(&input);
}
-#define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)"
-
static int batch_objects(struct batch_options *opt)
{
struct strbuf input = STRBUF_INIT;
diff --git a/object-file.c b/object-file.c
index 5b270f046d..417c76def3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2839,3 +2839,14 @@ int read_loose_object(const char *path,
munmap(map, mapsize);
return ret;
}
+
+void free_object_info_contents(struct object_info *object_info)
+{
+ if (!object_info)
+ return;
+ free(object_info->typep);
+ free(object_info->sizep);
+ free(object_info->disk_sizep);
+ free(object_info->delta_base_oid);
+ free(object_info->type_name);
+}
diff --git a/object-store.h b/object-store.h
index 5222ee5460..015838eb08 100644
--- a/object-store.h
+++ b/object-store.h
@@ -547,4 +547,7 @@ int for_each_object_in_pack(struct packed_git *p,
int for_each_packed_object(each_packed_object_fn, void *,
enum for_each_object_flags flags);
+/* Free pointers inside of object_info, but not object_info itself */
+void free_object_info_contents(struct object_info *object_info);
+
#endif /* OBJECT_STORE_H */
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dadf3b1458..55b2a1f06f 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -2,6 +2,9 @@
test_description='git cat-file'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
. ./test-lib.sh
test_cmdmode_usage () {
@@ -1093,4 +1096,349 @@ test_expect_success 'batch-command flush without --buffer' '
grep "^fatal:.*flush is only for --buffer mode.*" err
'
+# This section tests --batch-command with remote-object-info command
+# If a filter is not set, the filter defaults to "%(objectname) %(objectsize) %(objecttype)"
+# Since "%(objecttype)" is currently not supported by the command request, object-info,
+# the filters are set to "%(objectname) %(objectsize)".
+# Tests with the default filter are used to test the fallback to 'fetch' command
+
+set_transport_variables () {
+ tree_sha1=$(git -C "$1" write-tree)
+ commit_sha1=$(echo_without_newline "$commit_message" | git -C "$1" commit-tree $tree_sha1)
+ tag_sha1=$(echo_without_newline "$tag_content" | git -C "$1" hash-object -t tag --stdin -w)
+ tag_size=$(strlen "$tag_content")
+}
+
+# Test --batch-command remote-object-info with 'git://' transport
+
+. "$TEST_DIRECTORY"/lib-git-daemon.sh
+start_git_daemon --export-all --enable=receive-pack
+daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
+
+test_expect_success 'create repo to be served by git-daemon' '
+ git init "$daemon_parent" &&
+ echo_without_newline "$hello_content" > $daemon_parent/hello &&
+ git -C "$daemon_parent" update-index --add hello
+'
+
+set_transport_variables "$daemon_parent"
+
+test_expect_success 'batch-command remote-object-info git://' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $tree_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info git:// multiple sha1 per line' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1 $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// default filter' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 blob $hello_size" >expect &&
+ echo "$tree_sha1 tree $tree_size" >>expect &&
+ echo "$commit_sha1 commit $commit_size" >>expect &&
+ echo "$tag_sha1 tag $tag_size" >>expect &&
+ GIT_TRACE_PACKET=1 git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info git://' '
+ (
+ cd "$daemon_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_sha1 $tree_sha1
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_sha1 $tag_sha1
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+stop_git_daemon
+
+# Test --batch-command remote-object-info with 'http://' transport
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'create repo to be served by http:// transport' '
+ git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true &&
+ echo_without_newline "$hello_content" > $HTTPD_DOCUMENT_ROOT_PATH/http_parent/hello &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" update-index --add hello
+'
+set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent"
+
+test_expect_success 'batch-command remote-object-info http://' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $tree_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// one line' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1 $tree_sha1 $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info http://' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1 $tree_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_sha1 $tag_sha1
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// default filter' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ echo "$hello_sha1 blob $hello_size" >expect &&
+ echo "$tree_sha1 tree $tree_size" >>expect &&
+ echo "$commit_sha1 commit $commit_size" >>expect &&
+ echo "$tag_sha1 tag $tag_size" >>expect &&
+
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1 $tree_sha1
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'remote-object-info fails on unspported filter option (objectsize:disk)' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(objectsize:disk)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "objectsize:disk is currently not supported with remote-object-info" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on unspported filter option (deltabase)' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "deltabase is currently not supported with remote-object-info" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on server with legacy protocol' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git -c protocol.version=0 cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "object-info requires protocol v2" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on server with legacy protocol fallback' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git -c protocol.version=0 cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
+ EOF
+ test_i18ngrep "object-info requires protocol v2" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on malformed OID' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ malformed_object_id="this_id_is_not_valid" &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $malformed_object_id
+ EOF
+ test_i18ngrep "malformed object id '$malformed_object_id'" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on malformed OID fallback' '
+ (
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ malformed_object_id="this_id_is_not_valid" &&
+
+ test_must_fail git cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $malformed_object_id
+ EOF
+ test_i18ngrep "malformed object id '$malformed_object_id'" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on missing OID' '
+ git clone "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" missing_oid_repo &&
+ test_commit -C missing_oid_repo message1 c.txt &&
+ (
+ cd missing_oid_repo &&
+ object_id=$(git rev-parse message1:c.txt) &&
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $object_id
+ EOF
+ test_i18ngrep "object-info: not our ref $object_id" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on missing OID fallback' '
+ (
+ cd missing_oid_repo &&
+ object_id=$(git rev-parse message1:c.txt) &&
+ test_must_fail git cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $object_id
+ EOF
+ test_i18ngrep "fatal: remote error: upload-pack: not our ref $object_id" err
+ )
+'
+
+# Test --batch-command remote-object-info with 'file://' transport
+
+test_expect_success 'create repo to be served by file:// transport' '
+ git init server &&
+ git -C server config protocol.version 2 &&
+ echo_without_newline "$hello_content" > server/hello &&
+ git -C server update-index --add hello
+'
+
+set_transport_variables "server"
+
+test_expect_success 'batch-command remote-object-info file://' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1
+ remote-object-info "file://$(pwd)" $tree_sha1
+ remote-object-info "file://$(pwd)" $commit_sha1
+ remote-object-info "file://$(pwd)" $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info file:// multiple sha1 per line' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1 $tree_sha1 $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info file://' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 $hello_size" >expect &&
+ echo "$tree_sha1 $tree_size" >>expect &&
+ echo "$commit_sha1 $commit_size" >>expect &&
+ echo "$tag_sha1 $tag_size" >>expect &&
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1 $tree_sha1
+ remote-object-info "file://$(pwd)" $commit_sha1 $tag_sha1
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info file:// default filter' '
+ (
+ cd server &&
+
+ echo "$hello_sha1 blob $hello_size" >expect &&
+ echo "$tree_sha1 tree $tree_size" >>expect &&
+ echo "$commit_sha1 commit $commit_size" >>expect &&
+ echo "$tag_sha1 tag $tag_size" >>expect &&
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "file://$(pwd)" $hello_sha1 $tree_sha1
+ remote-object-info "file://$(pwd)" $commit_sha1 $tag_sha1
+ EOF
+ test_cmp expect actual
+ )
+'
+
test_done
--
2.37.1.455.g008518b4e5-goog
^ permalink raw reply related [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-07-28 23:02 ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
@ 2022-07-29 6:25 ` Ævar Arnfjörð Bjarmason
2022-08-07 5:50 ` ZheNing Hu
2022-08-08 18:07 ` Calvin Wan
2022-07-29 18:21 ` Junio C Hamano
2022-09-28 13:12 ` Ævar Arnfjörð Bjarmason
2 siblings, 2 replies; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 6:25 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, gitster, jonathantanmy, philipoakley, johncai86
On Thu, Jul 28 2022, Calvin Wan wrote:
> Since the `info` command in cat-file --batch-command prints object info
> for a given object, it is natural to add another command in cat-file
> --batch-command to print object info for a given object from a remote.
Is it ?:)
> Add `remote-object-info` to cat-file --batch-command.
I realize this bit of implementation changed in v4, i.e. it used to be
in "fetch", and I'm happy to have it moved out of there, we don't need
to overload it more.
But I remember thinking (and perhaps commenting on-list, I can't
remember) that the "object-info" server verb was a bit odd at the time
that it was implemented. I understand the motivation, but surely it was
stumbling its way towards being something more generic, i.e. being able
to just expose cmd_cat_file() in some form.
Which is one of the goals I've had in mind with working on fixing memory
leaks in various places, i.e. once you get common commands to clean up
after themselves it usually becomes to have a "command server".
So (and I don't mind if this is longer term, just asking), is there a
reason for why we wouldn't want to do away with object-info and this
"cat-file talks to a remote", in favor of just having support for
invoking arbitrary commands from a remote.
Of course that set of allowed RCE commands would be zero by default, but
if we had some way to define tha "cat-file" was allowed to be called,
and only if you invoked:
cat-file --batch="%(objectsize)"
Or whatever, but over the v2 protocol, wouldn't we basically have
object-info in a more roundabout way?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-07-29 6:25 ` Ævar Arnfjörð Bjarmason
@ 2022-08-07 5:50 ` ZheNing Hu
2022-08-08 18:07 ` Calvin Wan
1 sibling, 0 replies; 103+ messages in thread
From: ZheNing Hu @ 2022-08-07 5:50 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Calvin Wan, Git List, Junio C Hamano, Jonathan Tan,
Philip Oakley, johncai86
Hi, Ævar,
Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2022年7月29日周五 14:33写道:
>
>
> On Thu, Jul 28 2022, Calvin Wan wrote:
>
> > Since the `info` command in cat-file --batch-command prints object info
> > for a given object, it is natural to add another command in cat-file
> > --batch-command to print object info for a given object from a remote.
>
> Is it ?:)
>
> > Add `remote-object-info` to cat-file --batch-command.
>
> I realize this bit of implementation changed in v4, i.e. it used to be
> in "fetch", and I'm happy to have it moved out of there, we don't need
> to overload it more.
>
> But I remember thinking (and perhaps commenting on-list, I can't
> remember) that the "object-info" server verb was a bit odd at the time
> that it was implemented. I understand the motivation, but surely it was
> stumbling its way towards being something more generic, i.e. being able
> to just expose cmd_cat_file() in some form.
>
> Which is one of the goals I've had in mind with working on fixing memory
> leaks in various places, i.e. once you get common commands to clean up
> after themselves it usually becomes to have a "command server".
>
Now I'm starting to agree with you on this:
Maybe now git doesn't have a good interface to execute normal git commands
(except git-upload-pack, git-receive-pack...) on remote git server.
This patch wants to get remote object-info by the git-upload-pack interface.
But this thing can easily work by some RPC server e.g. Gitaly in Gitlab.
I don't know if git itself has the need to reimplement these remote
calls In some
secure environment?... Is it perhaps possible to get better
performance and versatility
than Gitaly? I donno.
> So (and I don't mind if this is longer term, just asking), is there a
> reason for why we wouldn't want to do away with object-info and this
> "cat-file talks to a remote", in favor of just having support for
> invoking arbitrary commands from a remote.
>
> Of course that set of allowed RCE commands would be zero by default, but
> if we had some way to define tha "cat-file" was allowed to be called,
> and only if you invoked:
>
> cat-file --batch="%(objectsize)"
>
> Or whatever, but over the v2 protocol, wouldn't we basically have
> object-info in a more roundabout way?
ZheNing Hu
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-07-29 6:25 ` Ævar Arnfjörð Bjarmason
2022-08-07 5:50 ` ZheNing Hu
@ 2022-08-08 18:07 ` Calvin Wan
2022-08-11 10:58 ` Ævar Arnfjörð Bjarmason
1 sibling, 1 reply; 103+ messages in thread
From: Calvin Wan @ 2022-08-08 18:07 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, gitster, jonathantanmy, philipoakley, johncai86
> > Since the `info` command in cat-file --batch-command prints object info
> > for a given object, it is natural to add another command in cat-file
> > --batch-command to print object info for a given object from a remote.
>
> Is it ?:)
Haha yes this could use a little rewording
> > Add `remote-object-info` to cat-file --batch-command.
>
> I realize this bit of implementation changed in v4, i.e. it used to be
> in "fetch", and I'm happy to have it moved out of there, we don't need
> to overload it more.
>
> But I remember thinking (and perhaps commenting on-list, I can't
> remember) that the "object-info" server verb was a bit odd at the time
> that it was implemented. I understand the motivation, but surely it was
> stumbling its way towards being something more generic, i.e. being able
> to just expose cmd_cat_file() in some form.
>
> Which is one of the goals I've had in mind with working on fixing memory
> leaks in various places, i.e. once you get common commands to clean up
> after themselves it usually becomes to have a "command server".
>
> So (and I don't mind if this is longer term, just asking), is there a
> reason for why we wouldn't want to do away with object-info and this
> "cat-file talks to a remote", in favor of just having support for
> invoking arbitrary commands from a remote.
>
> Of course that set of allowed RCE commands would be zero by default, but
> if we had some way to define tha "cat-file" was allowed to be called,
> and only if you invoked:
>
> cat-file --batch="%(objectsize)"
>
> Or whatever, but over the v2 protocol, wouldn't we basically have
> object-info in a more roundabout way?
While I do think that if we did have a set of allowed RCE commands, this
would be a good candidate to be one of those commands. I am worried
about security, maintainability, and server performance risks this change
would also carry. Figuring out which commands are secure and would
not overload the server, and then maintaining that set seems like a much
more worrisome design than having a secure git server.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-08-08 18:07 ` Calvin Wan
@ 2022-08-11 10:58 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-11 10:58 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, gitster, jonathantanmy, philipoakley, johncai86
On Mon, Aug 08 2022, Calvin Wan wrote:
>> > Since the `info` command in cat-file --batch-command prints object info
>> > for a given object, it is natural to add another command in cat-file
>> > --batch-command to print object info for a given object from a remote.
>>
>> Is it ?:)
>
> Haha yes this could use a little rewording
>
>> > Add `remote-object-info` to cat-file --batch-command.
>>
>> I realize this bit of implementation changed in v4, i.e. it used to be
>> in "fetch", and I'm happy to have it moved out of there, we don't need
>> to overload it more.
>>
>> But I remember thinking (and perhaps commenting on-list, I can't
>> remember) that the "object-info" server verb was a bit odd at the time
>> that it was implemented. I understand the motivation, but surely it was
>> stumbling its way towards being something more generic, i.e. being able
>> to just expose cmd_cat_file() in some form.
>>
>> Which is one of the goals I've had in mind with working on fixing memory
>> leaks in various places, i.e. once you get common commands to clean up
>> after themselves it usually becomes to have a "command server".
>>
>> So (and I don't mind if this is longer term, just asking), is there a
>> reason for why we wouldn't want to do away with object-info and this
>> "cat-file talks to a remote", in favor of just having support for
>> invoking arbitrary commands from a remote.
>>
>> Of course that set of allowed RCE commands would be zero by default, but
>> if we had some way to define tha "cat-file" was allowed to be called,
>> and only if you invoked:
>>
>> cat-file --batch="%(objectsize)"
>>
>> Or whatever, but over the v2 protocol, wouldn't we basically have
>> object-info in a more roundabout way?
>
> While I do think that if we did have a set of allowed RCE commands, this
> would be a good candidate to be one of those commands. I am worried
> about security, maintainability, and server performance risks this change
> would also carry. Figuring out which commands are secure and would
> not overload the server, and then maintaining that set seems like a much
> more worrisome design than having a secure git server.
I'm only suggesting that the interface be extendable enough to allow for
that, not that we do it right now, i.e. that the protocol command could
be:
# With appropriate \0 delimiting etc.
cmd cat-file -e <SHA1>
As opposed to something like:
object-exists <SHA1>
But that we would *not* for now expose some mechanism where the server
operator could configure which arbitrary commands to allow, we'd just
intercept that specifically, and dispatch it to the appropriate code in
builtin/cat-file.c (or libify them enough to expose them, and then call
that).
Of course one of the points of doing so would be to eventually expose
the ability to run a larger set of safe commands remotely, if the server
operator agrees, but we'd leave that for the future.
Right now we'd get the benefit of not duplicating various parts of
existing code, and having a plain one-to-one mapping to existing
commands.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-07-28 23:02 ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
2022-07-29 6:25 ` Ævar Arnfjörð Bjarmason
@ 2022-07-29 18:21 ` Junio C Hamano
2022-08-08 18:37 ` Calvin Wan
2022-09-28 13:12 ` Ævar Arnfjörð Bjarmason
2 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2022-07-29 18:21 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86
Calvin Wan <calvinwan@google.com> writes:
> While `info` takes object ids one at a time, this creates overhead when
> making requests to a server so `remote-object-info` instead can take
> multiple object ids at once.
I am not sure if the above is desirable---doesn't it make the
protocol (i.e. the requests the invoker of "cat-file" needs to
prepare) unnecessarily complex? If we have "buffered" mode, the
implementation can collect and batch requests and talk with the
other side in a single communication with the other end, no?
> In --buffer mode, this changes to:
>
> - Receive and parse input from user
> - Store respective function attached to command in a queue
> - After flush, loop through commands in queue
> - Call respective function attached to command
> - Set batch mode state, get object info, print object info
So, the existing buffered mode is poorly designed, in other words.
Isn't this something we can fix for performance without affecting
the existing callers?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-07-29 18:21 ` Junio C Hamano
@ 2022-08-08 18:37 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-08 18:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jonathantanmy, philipoakley, johncai86
> > While `info` takes object ids one at a time, this creates overhead when
> > making requests to a server so `remote-object-info` instead can take
> > multiple object ids at once.
>
> I am not sure if the above is desirable---doesn't it make the
> protocol (i.e. the requests the invoker of "cat-file" needs to
> prepare) unnecessarily complex? If we have "buffered" mode, the
> implementation can collect and batch requests and talk with the
> other side in a single communication with the other end, no?
I'm a little bit confused as to what you are suggesting here, in
terms of how a user would want to call this command. Are you
saying this functionality should only work in "buffered" mode?
Or are you saying that `remote-object-info` should only take
one object id at a time, and in buffer mode, those requests are
batched together?
>
> > In --buffer mode, this changes to:
> >
> > - Receive and parse input from user
> > - Store respective function attached to command in a queue
> > - After flush, loop through commands in queue
> > - Call respective function attached to command
> > - Set batch mode state, get object info, print object info
>
> So, the existing buffered mode is poorly designed, in other words.
> Isn't this something we can fix for performance without affecting
> the existing callers?
Much of the existing cat-file code seems to be designed around
taking in one object/command at a time, and --buffer can be
thought of as a wrapper around that preexisting code that simply
stores a queue of commands rather than running them immediately.
I think the existing buffered mode is well designed -- this part of the
commit message is to explain why at some point I need to preprocess
what is in the buffer
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v5 6/6] cat-file: add remote-object-info to batch-command
2022-07-28 23:02 ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
2022-07-29 6:25 ` Ævar Arnfjörð Bjarmason
2022-07-29 18:21 ` Junio C Hamano
@ 2022-09-28 13:12 ` Ævar Arnfjörð Bjarmason
2 siblings, 0 replies; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-28 13:12 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, gitster, jonathantanmy, philipoakley, johncai86
On Thu, Jul 28 2022, Calvin Wan wrote:
> + gtransport->smart_options->object_info_data = &remote_object_info;
> + retval = transport_fetch_refs(gtransport, NULL);
I see Phillip Wood noted that this fails SANITIZE=address in
https://lore.kernel.org/git/00a58dbb-32e8-98dd-6c72-235e9c33af3b@gmail.com/;
just a *bump* about this issue. I noticed "seen" failed, bisected down
to 6/6 here, then saw his >month old report...
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (14 preceding siblings ...)
2022-07-28 23:02 ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
@ 2022-07-31 15:02 ` ZheNing Hu
2022-08-08 17:32 ` Calvin Wan
2022-08-13 22:17 ` Junio C Hamano
16 siblings, 1 reply; 103+ messages in thread
From: ZheNing Hu @ 2022-07-31 15:02 UTC (permalink / raw)
To: Calvin Wan
Cc: Git List, Junio C Hamano, Jonathan Tan, Philip Oakley, johncai86,
Taylor Blau
Hi, Calvin Wan,
Calvin Wan <calvinwan@google.com> 于2022年5月3日周二 08:14写道:
>
> Sometimes it is useful to get information about an object without having
> to download it completely. The server logic has already been implemented
> as “a2ba162cda (object-info: support for retrieving object info,
> 2021-04-20)”. This patch implements the client option for it.
>
> Add `--object-info` option to `cat-file --batch-command`. This option
> allows the client to make an object-info command request to a server
> that supports protocol v2. If the server is v2, but does not allow for
> the object-info command request, the entire object is fetched and the
> relevant object info is returned.
>
> Summary of changes ==================
>
> Patches 1, 2, 3, and 7 are small changes that setup the main
> implementation. Patch 4 sets up object-info to be backwards compatible
> for future patch series that adds additional attributes. Patch 5 adds
> internal trasnport functions to send and receive object-info command
> request packets. Patch 6 adds the fallback if object-info is not
> supported or fails. Patch 8 adds the cat-file implementation.
>
I have to say I am very curious about this feature. Since the current
git partial-clone interface supports only a few filters:
blob:limit
blob:none
sparse:oid
tree:0
Though these filters reduce the number of objects downloaded each time,
sometimes I just need *only* one blob object, git partial-clone will still
download some additional commits and tree objects from the remote.
This patch can get the remote-object-info by git cat-file, so can we go
further to get the remote-object-content of the object?
Something like:
$ git cat-file --batch-command
remote-object-content origin <oid>
> Changes since V3 ================
>
> * Object-info is now implemented in cat-file --batch-command rather
> than fetch (new base commit)
> * Removed config option to advertise object-info
> * Added forwards and backwards compability for object-info
> * Split up some patches to better describe and visualize changes
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
>
> Calvin Wan (8):
> fetch-pack: refactor packet writing
> fetch-pack: move fetch default settings
> object-store: add function to free object_info contents
> object-info: send attribute packet regardless of object ids
> transport: add client side capability to request object-info
> transport: add object-info fallback to fetch
> cat-file: move parse_cmd and DEFAULT_FORMAT up
> cat-file: add --batch-command remote-object-info command
>
> Documentation/git-cat-file.txt | 16 +-
> builtin/cat-file.c | 225 ++++++++++++++++-----
> fetch-pack.c | 61 ++++--
> fetch-pack.h | 10 +
> object-file.c | 16 ++
> object-store.h | 3 +
> protocol-caps.c | 14 +-
> t/t1006-cat-file.sh | 347 +++++++++++++++++++++++++++++++++
> transport-helper.c | 7 +-
> transport.c | 97 ++++++++-
> transport.h | 11 ++
> 11 files changed, 728 insertions(+), 79 deletions(-)
>
> --
> 2.36.0.rc2.10170.gb555eefa6f
>
Thanks!
ZheNing Hu
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command
2022-07-31 15:02 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command ZheNing Hu
@ 2022-08-08 17:32 ` Calvin Wan
0 siblings, 0 replies; 103+ messages in thread
From: Calvin Wan @ 2022-08-08 17:32 UTC (permalink / raw)
To: ZheNing Hu
Cc: Git List, Junio C Hamano, Jonathan Tan, Philip Oakley, johncai86,
Taylor Blau
> I have to say I am very curious about this feature. Since the current
> git partial-clone interface supports only a few filters:
>
> blob:limit
> blob:none
> sparse:oid
> tree:0
>
> Though these filters reduce the number of objects downloaded each time,
> sometimes I just need *only* one blob object, git partial-clone will still
> download some additional commits and tree objects from the remote.
>
> This patch can get the remote-object-info by git cat-file, so can we go
> further to get the remote-object-content of the object?
>
> Something like:
>
> $ git cat-file --batch-command
> remote-object-content origin <oid>
I think this can be potentially future work. Part of my hesitation stems
from the fact that there is no need to print out any object info when
downloading an object. Therefore, it seems better suited to first be
implemented in git fetch (or added as a filter to partial-clone). And
then remote-object-content can be a combination of fetch +
remote-object-info.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command
2022-05-02 17:08 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
` (15 preceding siblings ...)
2022-07-31 15:02 ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command ZheNing Hu
@ 2022-08-13 22:17 ` Junio C Hamano
16 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2022-08-13 22:17 UTC (permalink / raw)
To: Calvin Wan; +Cc: git, jonathantanmy, philipoakley, johncai86, me
Style and coccinelle fixes; please squash in when you reroll.
* var = xcalloc(count, sizeof(*var)) --> CALLOC_ARRAY(var, count)
for count != 1
* sizeof (type) --> sizeof(type)
Thanks.
builtin/cat-file.c | 2 +-
transport.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 57c090f249..4afe82322f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -515,7 +515,7 @@ static int get_remote_info(struct batch_options *opt, int argc, const char **arg
size_t j;
int include_size = 0, include_type = 0;
- remote_object_info = xcalloc(object_info_oids.nr, sizeof(struct object_info));
+ CALLOC_ARRAY(remote_object_info, object_info_oids.nr);
gtransport->smart_options->object_info = 1;
gtransport->smart_options->object_info_oids = &object_info_oids;
/**
diff --git a/transport.c b/transport.c
index 64bcc311ff..87197f0ec7 100644
--- a/transport.c
+++ b/transport.c
@@ -442,7 +442,7 @@ static int fetch_refs_via_pack(struct transport *transport,
struct ref *refs = NULL;
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
- struct ref *object_info_refs = xcalloc(1, sizeof (struct ref));
+ struct ref *object_info_refs = xcalloc(1, sizeof(struct ref));
memset(&args, 0, sizeof(args));
args.uploadpack = data->options.uploadpack;
@@ -479,7 +479,7 @@ static int fetch_refs_via_pack(struct transport *transport,
args.quiet = 1;
args.no_progress = 1;
for (i = 0; i < transport->smart_options->object_info_oids->nr; i++) {
- struct ref *temp_ref = xcalloc(1, sizeof (struct ref));
+ struct ref *temp_ref = xcalloc(1, sizeof(struct ref));
temp_ref->old_oid = *(transport->smart_options->object_info_oids->oid + i);
temp_ref->exact_oid = 1;
ref->next = temp_ref;
--
2.37.2-479-gedbcd27e9d
^ permalink raw reply related [flat|nested] 103+ messages in thread