All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset
@ 2014-04-02  6:39 Jeff King
  2014-04-02 17:39 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2014-04-02  6:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

When we are sending a packfile to a remote, we currently try
to reuse a whole chunk of packfile without bothering to look
at the individual objects. This can make things like initial
clones much lighter on the server, as we can just dump the
packfile bytes.

However, it's possible that the other side cannot read our
packfile verbatim. For example, we may have objects stored
as OFS_DELTA, but the client is an antique version of git
that only understands REF_DELTA. We negotiate this
capability over the fetch protocol. A normal pack-objects
run will convert OFS_DELTA into REF_DELTA on the fly, but
the "reuse pack" code path never even looks at the objects.

This patch disables packfile reuse if the other side is
missing any capabilities that we might have used in the
on-disk pack. Right now the only one is OFS_DELTA, but we
may need to expand in the future (e.g., if packv4 introduces
new object types).

We could be more thorough and only disable reuse in this
case when we actually have an OFS_DELTA to send, but:

  1. We almost always will have one, since we prefer
     OFS_DELTA to REF_DELTA when possible. So this case
     would almost never come up.

  2. Looking through the objects defeats the purpose of the
     optimization, which is to do as little work as possible
     to get the bytes to the remote.

Signed-off-by: Jeff King <peff@peff.net>
---
I happened to be fooling around with git v1.4.0 today, and noticed a
problem fetching from GitHub. Pre-OFS_DELTA git versions are ancient by
today's standard, but it's quite easy to remain compatible here, so I
don't see why not. And in theory, alternate implementations might not
understand OFS_DELTA, though in practice I would consider such an
implementation to be pretty crappy.

 builtin/pack-objects.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7950c43..1503632 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2439,12 +2439,23 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
 	}
 }
 
+/*
+ * This tracks any options which a reader of the pack might
+ * not understand, and which would therefore prevent blind reuse
+ * of what we have on disk.
+ */
+static int pack_options_allow_reuse(void)
+{
+	return allow_ofs_delta;
+}
+
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
 	if (prepare_bitmap_walk(revs) < 0)
 		return -1;
 
-	if (!reuse_partial_packfile_from_bitmap(
+	if (pack_options_allow_reuse() &&
+	    !reuse_partial_packfile_from_bitmap(
 			&reuse_packfile,
 			&reuse_packfile_objects,
 			&reuse_packfile_offset)) {
-- 
1.9.1.656.ge8a0637

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset
  2014-04-02  6:39 [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset Jeff King
@ 2014-04-02 17:39 ` Junio C Hamano
  2014-04-04 21:48   ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2014-04-02 17:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> When we are sending a packfile to a remote, we currently try
> to reuse a whole chunk of packfile without bothering to look
> at the individual objects. This can make things like initial
> clones much lighter on the server, as we can just dump the
> packfile bytes.
>
> However, it's possible that the other side cannot read our
> packfile verbatim. For example, we may have objects stored
> as OFS_DELTA, but the client is an antique version of git
> that only understands REF_DELTA. We negotiate this
> capability over the fetch protocol. A normal pack-objects
> run will convert OFS_DELTA into REF_DELTA on the fly, but
> the "reuse pack" code path never even looks at the objects.

The above makes it sound like "reuse pack" codepath is broken. Is it
too much hassle to peek at the initial bytes of each object to see
how they are encoded? Would it be possible to convert OFS_DELTA to
REF_DELTA on the fly on that codepath as well, instead of disabling
the reuse altogether?

> This patch disables packfile reuse if the other side is
> missing any capabilities that we might have used in the
> on-disk pack. Right now the only one is OFS_DELTA, but we
> may need to expand in the future (e.g., if packv4 introduces
> new object types).
>
> We could be more thorough and only disable reuse in this
> case when we actually have an OFS_DELTA to send, but:
>
>   1. We almost always will have one, since we prefer
>      OFS_DELTA to REF_DELTA when possible. So this case
>      would almost never come up.
>
>   2. Looking through the objects defeats the purpose of the
>      optimization, which is to do as little work as possible
>      to get the bytes to the remote.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I happened to be fooling around with git v1.4.0 today, and noticed a
> problem fetching from GitHub. Pre-OFS_DELTA git versions are ancient by
> today's standard, but it's quite easy to remain compatible here, so I
> don't see why not.




 And in theory, alternate implementations might not
> understand OFS_DELTA, though in practice I would consider such an
> implementation to be pretty crappy.
>
>  builtin/pack-objects.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 7950c43..1503632 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2439,12 +2439,23 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
>  	}
>  }
>  
> +/*
> + * This tracks any options which a reader of the pack might
> + * not understand, and which would therefore prevent blind reuse
> + * of what we have on disk.
> + */
> +static int pack_options_allow_reuse(void)
> +{
> +	return allow_ofs_delta;
> +}
> +
>  static int get_object_list_from_bitmap(struct rev_info *revs)
>  {
>  	if (prepare_bitmap_walk(revs) < 0)
>  		return -1;
>  
> -	if (!reuse_partial_packfile_from_bitmap(
> +	if (pack_options_allow_reuse() &&
> +	    !reuse_partial_packfile_from_bitmap(
>  			&reuse_packfile,
>  			&reuse_packfile_objects,
>  			&reuse_packfile_offset)) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset
  2014-04-02 17:39 ` Junio C Hamano
@ 2014-04-04 21:48   ` Jeff King
  2014-04-04 22:28     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2014-04-04 21:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Apr 02, 2014 at 10:39:13AM -0700, Junio C Hamano wrote:

> > However, it's possible that the other side cannot read our
> > packfile verbatim. For example, we may have objects stored
> > as OFS_DELTA, but the client is an antique version of git
> > that only understands REF_DELTA. We negotiate this
> > capability over the fetch protocol. A normal pack-objects
> > run will convert OFS_DELTA into REF_DELTA on the fly, but
> > the "reuse pack" code path never even looks at the objects.
> 
> The above makes it sound like "reuse pack" codepath is broken.

It is broken (without this patch), though in practice only for ancient
(pre-1.4.x) clients.

> Is it too much hassle to peek at the initial bytes of each object to
> see how they are encoded? Would it be possible to convert OFS_DELTA to
> REF_DELTA on the fly on that codepath as well, instead of disabling
> the reuse altogether?

It's a mistake to peek ahead of time. Part of the point of the
pack-reuse optimization is to start sending out bytes as soon as
possible, since the network is quite often the bottleneck. So we would
not want to look through all of the to-be-sent data before sending out
the first byte.

We could convert OFS_DELTA to REF_DELTA on the fly. That _may_ have a
performance impact. Right now, we are basically doing the equivalent of
sendfile(), and conversion would involve iterating through each object
and examining the header.  I think that's probably not too bad, though.
The most expensive part of that, stepping to the next object, requires
scanning through the zlib packets, but we should be able to use the
revidx to avoid that.

I'm not sure it's even worth the code complexity, though. The non-reuse
codepath is not that much slower, and it should be extremely rare for a
client not to support OFS_DELTA these days.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset
  2014-04-04 21:48   ` Jeff King
@ 2014-04-04 22:28     ` Junio C Hamano
  2014-04-04 23:13       ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2014-04-04 22:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> We could convert OFS_DELTA to REF_DELTA on the fly. That _may_ have a
> performance impact. Right now, we are basically doing the equivalent of
> sendfile(), and conversion would involve iterating through each object
> and examining the header.  I think that's probably not too bad, though.
> The most expensive part of that, stepping to the next object, requires
> scanning through the zlib packets, but we should be able to use the
> revidx to avoid that.
>
> I'm not sure it's even worth the code complexity, though. The non-reuse
> codepath is not that much slower, and it should be extremely rare for a
> client not to support OFS_DELTA these days.

OK, together with the fact that only ancient versions of fetcher
would trigger this "do not reuse" codepath, I agree that we should
go the simplest route this patch takes.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset
  2014-04-04 22:28     ` Junio C Hamano
@ 2014-04-04 23:13       ` Jeff King
  2014-04-07 17:15         ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2014-04-04 23:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Apr 04, 2014 at 03:28:48PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We could convert OFS_DELTA to REF_DELTA on the fly. That _may_ have a
> > performance impact. Right now, we are basically doing the equivalent of
> > sendfile(), and conversion would involve iterating through each object
> > and examining the header.  I think that's probably not too bad, though.
> > The most expensive part of that, stepping to the next object, requires
> > scanning through the zlib packets, but we should be able to use the
> > revidx to avoid that.
> >
> > I'm not sure it's even worth the code complexity, though. The non-reuse
> > codepath is not that much slower, and it should be extremely rare for a
> > client not to support OFS_DELTA these days.
> 
> OK, together with the fact that only ancient versions of fetcher
> would trigger this "do not reuse" codepath, I agree that we should
> go the simplest route this patch takes.

By the way, we may want to revisit this if we grow more features that do
not allow straight byte-for-byte reuse. I am thinking specifically if we
grow a packv4-like representation for an object, and we plan to convert
on-the-fly to existing packv2 clients. But I think the sensible steps
for that are:

  1. If we have v4 on disk and are outputting v2, add this case to the
     "can_reuse" function I just added. I.e., start out correct, and
     turn off the optimization.

  2. Experiment with on-the-fly conversion. It may be that the
     conversion is so expensive that the reuse optimization gets lost in
     the noise. Or maybe we can reclaim most of the advantage of the
     reuse code path, and it is worth going object-by-object and
     converting. But we won't know until we can measure.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset
  2014-04-04 23:13       ` Jeff King
@ 2014-04-07 17:15         ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2014-04-07 17:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Fri, Apr 04, 2014 at 03:28:48PM -0700, Junio C Hamano wrote:
> ...
>> OK, together with the fact that only ancient versions of fetcher
>> would trigger this "do not reuse" codepath, I agree that we should
>> go the simplest route this patch takes.
>
> By the way, we may want to revisit this if we grow more features that do
> not allow straight byte-for-byte reuse. 

True.

> I am thinking specifically if we
> grow a packv4-like representation for an object, and we plan to convert
> on-the-fly to existing packv2 clients. But I think the sensible steps
> for that are:
>
>   1. If we have v4 on disk and are outputting v2, add this case to the
>      "can_reuse" function I just added. I.e., start out correct, and
>      turn off the optimization.
>
>   2. Experiment with on-the-fly conversion. It may be that the
>      conversion is so expensive that the reuse optimization gets lost in
>      the noise. Or maybe we can reclaim most of the advantage of the
>      reuse code path, and it is worth going object-by-object and
>      converting. But we won't know until we can measure.

Yeah; I think these are sensible steps in the future direction.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-07 17:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-02  6:39 [PATCH] pack-objects: do not reuse packfiles without --delta-base-offset Jeff King
2014-04-02 17:39 ` Junio C Hamano
2014-04-04 21:48   ` Jeff King
2014-04-04 22:28     ` Junio C Hamano
2014-04-04 23:13       ` Jeff King
2014-04-07 17:15         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.