All of lore.kernel.org
 help / color / mirror / Atom feed
* No progress from push when using bitmaps
@ 2014-03-13  0:21 Shawn Pearce
  2014-03-13 21:26 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Shawn Pearce @ 2014-03-13  0:21 UTC (permalink / raw)
  To: git

Today I tried pushing a copy of linux.git from a client that had
bitmaps into a JGit server. The client stalled for a long time with no
progress, because it reused the existing pack. No progress appeared
while it was sending the existing file on the wire:

  $ git push git://localhost/linux.git master
  Reusing existing pack: 2938117, done.
  Total 2938117 (delta 0), reused 0 (delta 0)
  remote: Resolving deltas:  66% (1637269/2455727)

This is not the best user experience. :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-13  0:21 No progress from push when using bitmaps Shawn Pearce
@ 2014-03-13 21:26 ` Jeff King
  2014-03-13 22:01   ` Shawn Pearce
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2014-03-13 21:26 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Wed, Mar 12, 2014 at 05:21:21PM -0700, Shawn Pearce wrote:

> Today I tried pushing a copy of linux.git from a client that had
> bitmaps into a JGit server. The client stalled for a long time with no
> progress, because it reused the existing pack. No progress appeared
> while it was sending the existing file on the wire:
> 
>   $ git push git://localhost/linux.git master
>   Reusing existing pack: 2938117, done.
>   Total 2938117 (delta 0), reused 0 (delta 0)
>   remote: Resolving deltas:  66% (1637269/2455727)
> 
> This is not the best user experience. :-(

Yeah, I agree that sucks. I hadn't noticed it, as I don't typically have
my client repos bitmapped (and on fetch, the interesting progress is
coming from the local index-pack).

It would definitely be good to have throughput measurements while
writing out the pack. However, I'm not sure we have anything useful to
count. We know the total number of objects we're reusing, but we're not
actually parsing the data; we're just blitting it out as a stream. I
think the progress code will need some refactoring to handle a
throughput-only case.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-13 21:26 ` Jeff King
@ 2014-03-13 22:01   ` Shawn Pearce
  2014-03-13 22:07     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Shawn Pearce @ 2014-03-13 22:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, Mar 13, 2014 at 2:26 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Mar 12, 2014 at 05:21:21PM -0700, Shawn Pearce wrote:
>
>> Today I tried pushing a copy of linux.git from a client that had
>> bitmaps into a JGit server. The client stalled for a long time with no
>> progress, because it reused the existing pack. No progress appeared
>> while it was sending the existing file on the wire:
>>
>>   $ git push git://localhost/linux.git master
>>   Reusing existing pack: 2938117, done.
>>   Total 2938117 (delta 0), reused 0 (delta 0)
>>   remote: Resolving deltas:  66% (1637269/2455727)
>>
>> This is not the best user experience. :-(
>
> Yeah, I agree that sucks. I hadn't noticed it, as I don't typically have
> my client repos bitmapped (and on fetch, the interesting progress is
> coming from the local index-pack).
>
> It would definitely be good to have throughput measurements while
> writing out the pack. However, I'm not sure we have anything useful to
> count. We know the total number of objects we're reusing, but we're not
> actually parsing the data; we're just blitting it out as a stream. I
> think the progress code will need some refactoring to handle a
> throughput-only case.

Yes. I think JGit suffers from this same bug, and again we never
noticed it because usually only the servers are bitmapped, not the
clients.

pack-objects writes a throughput meter when its writing objects.
Really just the bytes out/second would be enough to let the user know
the client is working. Unfortunately I think that is still tied to the
overall progress system having some other counter?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-13 22:01   ` Shawn Pearce
@ 2014-03-13 22:07     ` Jeff King
  2014-03-13 22:23       ` Junio C Hamano
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jeff King @ 2014-03-13 22:07 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Thu, Mar 13, 2014 at 03:01:09PM -0700, Shawn Pearce wrote:

> > It would definitely be good to have throughput measurements while
> > writing out the pack. However, I'm not sure we have anything useful to
> > count. We know the total number of objects we're reusing, but we're not
> > actually parsing the data; we're just blitting it out as a stream. I
> > think the progress code will need some refactoring to handle a
> > throughput-only case.
> 
> Yes. I think JGit suffers from this same bug, and again we never
> noticed it because usually only the servers are bitmapped, not the
> clients.
> 
> pack-objects writes a throughput meter when its writing objects.
> Really just the bytes out/second would be enough to let the user know
> the client is working. Unfortunately I think that is still tied to the
> overall progress system having some other counter?

Yes, I'm looking at it right now. The throughput meter is actually
connected to the sha1fd output. So really we just need to call
display_progress periodically as we loop through the data. It's a
one-liner fix.

_But_ it still looks ugly, because, as you mention, it's tied to the
progress meter, which is counting up to N objects. So we basically sit
there at "0", pumping data, and then after the pack is done, we can say
we sent N. :)

There are a few ways around this:

  1. Add a new phase "Writing packs" which counts from 0 to 1. Even
     though it's more accurate, moving from 0 to 1 really isn't that
     useful (the throughput is, but the 0/1 just looks like noise).

  2. Add a new phase "Writing reused objects" that counts from 0 bytes
     up to N bytes. This looks stupid, though, because we are repeating
     the current byte count both here and in the throughput.

  3. Use the regular "Writing objects" progress, but fake the object
     count. We know we are writing M bytes with N objects. Bump the
     counter by 1 for every M/N bytes we write.

The first two require some non-trivial surgery to the progress code. I
am leaning towards the third. Not just because it's easy, but because I
think it actually shows the most intuitive display. Yes, it's fudging
the object numbers, but those are largely meaningless anyway (in fact,
it makes them _better_ because now they're even, instead of getting 95%
done and then hitting some blob that is as big as the rest of the repo
combined).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-13 22:07     ` Jeff King
@ 2014-03-13 22:23       ` Junio C Hamano
  2014-03-13 22:24       ` Jeff King
  2014-03-14  9:43       ` Michael Haggerty
  2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2014-03-13 22:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, git

Jeff King <peff@peff.net> writes:

> There are a few ways around this:
>
>   1. Add a new phase "Writing packs" which counts from 0 to 1. Even
>      though it's more accurate, moving from 0 to 1 really isn't that
>      useful (the throughput is, but the 0/1 just looks like noise).
>
>   2. Add a new phase "Writing reused objects" that counts from 0 bytes
>      up to N bytes. This looks stupid, though, because we are repeating
>      the current byte count both here and in the throughput.
>
>   3. Use the regular "Writing objects" progress, but fake the object
>      count. We know we are writing M bytes with N objects. Bump the
>      counter by 1 for every M/N bytes we write.
>
> The first two require some non-trivial surgery to the progress code. I
> am leaning towards the third. Not just because it's easy, but because I
> think it actually shows the most intuitive display. Yes, it's fudging
> the object numbers, but those are largely meaningless anyway (in fact,
> it makes them _better_ because now they're even, instead of getting 95%
> done and then hitting some blob that is as big as the rest of the repo
> combined).

I think the above argument, especially the "fudging but largely
meaningless anyway" part, makes perfect sense.

Thanks for looking into this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-13 22:07     ` Jeff King
  2014-03-13 22:23       ` Junio C Hamano
@ 2014-03-13 22:24       ` Jeff King
  2014-03-14  9:43       ` Michael Haggerty
  2 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2014-03-13 22:24 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Thu, Mar 13, 2014 at 06:07:54PM -0400, Jeff King wrote:

>   3. Use the regular "Writing objects" progress, but fake the object
>      count. We know we are writing M bytes with N objects. Bump the
>      counter by 1 for every M/N bytes we write.

Here is that strategy. I think it looks pretty nice, and it seamlessly
handles the case where you have extra objects to send on top of the
reused pack (we just keep the same progress meter counting up).

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 831dd05..f187859 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -709,7 +709,7 @@ static struct object_entry **compute_write_order(void)
 static off_t write_reused_pack(struct sha1file *f)
 {
 	unsigned char buffer[8192];
-	off_t to_write;
+	off_t to_write, total;
 	int fd;
 
 	if (!is_pack_valid(reuse_packfile))
@@ -726,7 +726,7 @@ static off_t write_reused_pack(struct sha1file *f)
 	if (reuse_packfile_offset < 0)
 		reuse_packfile_offset = reuse_packfile->pack_size - 20;
 
-	to_write = reuse_packfile_offset - sizeof(struct pack_header);
+	total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
 
 	while (to_write) {
 		int read_pack = xread(fd, buffer, sizeof(buffer));
@@ -739,10 +739,23 @@ static off_t write_reused_pack(struct sha1file *f)
 
 		sha1write(f, buffer, read_pack);
 		to_write -= read_pack;
+
+		/*
+		 * We don't know the actual number of objects written,
+		 * only how many bytes written, how many bytes total, and
+		 * how many objects total. So we can fake it by pretending all
+		 * objects we are writing are the same size. This gives us a
+		 * smooth progress meter, and at the end it matches the true
+		 * answer.
+		 */
+		written = reuse_packfile_objects *
+				(((double)(total - to_write)) / total);
+		display_progress(progress_state, written);
 	}
 
 	close(fd);
-	written += reuse_packfile_objects;
+	written = reuse_packfile_objects;
+	display_progress(progress_state, written);
 	return reuse_packfile_offset - sizeof(struct pack_header);
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-13 22:07     ` Jeff King
  2014-03-13 22:23       ` Junio C Hamano
  2014-03-13 22:24       ` Jeff King
@ 2014-03-14  9:43       ` Michael Haggerty
  2014-03-14 10:21         ` Duy Nguyen
  2 siblings, 1 reply; 10+ messages in thread
From: Michael Haggerty @ 2014-03-14  9:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, git

On 03/13/2014 11:07 PM, Jeff King wrote:
> On Thu, Mar 13, 2014 at 03:01:09PM -0700, Shawn Pearce wrote:
> 
>>> It would definitely be good to have throughput measurements while
>>> writing out the pack. However, I'm not sure we have anything useful to
>>> count. We know the total number of objects we're reusing, but we're not
>>> actually parsing the data; we're just blitting it out as a stream. I
>>> think the progress code will need some refactoring to handle a
>>> throughput-only case.
>>
>> Yes. I think JGit suffers from this same bug, and again we never
>> noticed it because usually only the servers are bitmapped, not the
>> clients.
>>
>> pack-objects writes a throughput meter when its writing objects.
>> Really just the bytes out/second would be enough to let the user know
>> the client is working. Unfortunately I think that is still tied to the
>> overall progress system having some other counter?
> 
> Yes, I'm looking at it right now. The throughput meter is actually
> connected to the sha1fd output. So really we just need to call
> display_progress periodically as we loop through the data. It's a
> one-liner fix.
> 
> _But_ it still looks ugly, because, as you mention, it's tied to the
> progress meter, which is counting up to N objects. So we basically sit
> there at "0", pumping data, and then after the pack is done, we can say
> we sent N. :)
> 
> There are a few ways around this:
> 
>   1. Add a new phase "Writing packs" which counts from 0 to 1. Even
>      though it's more accurate, moving from 0 to 1 really isn't that
>      useful (the throughput is, but the 0/1 just looks like noise).
> 
>   2. Add a new phase "Writing reused objects" that counts from 0 bytes
>      up to N bytes. This looks stupid, though, because we are repeating
>      the current byte count both here and in the throughput.
> 
>   3. Use the regular "Writing objects" progress, but fake the object
>      count. We know we are writing M bytes with N objects. Bump the
>      counter by 1 for every M/N bytes we write.

Would it be practical to change it to a percentage of bytes written?
Then we'd have progress info that is both convenient *and* truthful.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-14  9:43       ` Michael Haggerty
@ 2014-03-14 10:21         ` Duy Nguyen
  2014-03-14 15:29           ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Duy Nguyen @ 2014-03-14 10:21 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Shawn Pearce, git

On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Would it be practical to change it to a percentage of bytes written?
> Then we'd have progress info that is both convenient *and* truthful.

I agreed for a second, then remembered that we don't know the final
pack size until we finish writing it.. Not sure if we could estimate
(cheaply) with a good accuracy though.

If an object is reused, we already know its compressed size. If it's
not reused and is a loose object, we could use on-disk size. It's a
lot harder to estimate an not-reused, deltified object. All we have is
the uncompressed size, and the size of each delta in the delta chain..
Neither gives a good hint of what the compressed size would be.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-14 10:21         ` Duy Nguyen
@ 2014-03-14 15:29           ` Jeff King
  2014-03-14 23:53             ` Duy Nguyen
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2014-03-14 15:29 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Michael Haggerty, Shawn Pearce, git

On Fri, Mar 14, 2014 at 05:21:59PM +0700, Duy Nguyen wrote:

> On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> > Would it be practical to change it to a percentage of bytes written?
> > Then we'd have progress info that is both convenient *and* truthful.
> 
> I agreed for a second, then remembered that we don't know the final
> pack size until we finish writing it.. Not sure if we could estimate
> (cheaply) with a good accuracy though.

Right. I'm not sure what Michael meant by "it". We can send a percentage
of bytes written for the reused pack (my option 2), but we do not know
the total bytes for the rest of the objects. So we'd end up with two
progress meters (one for the reused pack, and one for everything else),
both counting up to different endpoints. And it would require quite a
few changes to the progress code.

> If an object is reused, we already know its compressed size. If it's
> not reused and is a loose object, we could use on-disk size. It's a
> lot harder to estimate an not-reused, deltified object. All we have is
> the uncompressed size, and the size of each delta in the delta chain..
> Neither gives a good hint of what the compressed size would be.

Hmm. I think we do have the compressed delta size after having run the
compression phase (because that is ultimately what we compare to find
the best delta). Loose objects are probably the hardest here, as we
actually recompress them (IIRC, because packfiles encode the type/size
info outside of the compressed bit, whereas it is inside for loose
objects; the "experimental loose" format harmonized this, but it never
caught on).

Without doing that recompression, any value you came up with would be an
estimate, though it would be pretty close (not off by more than a few
bytes per object). However, you can't just run through the packing list
and add up the object sizes; you'd need to do a real "dry-run" through
the writing phase. There are probably more I'm missing, but you need at
least to figure out:

  1. The actual compressed size of a full loose object, as described
     above.

  2. The variable-length headers for each object based on its type and
     size.

  3. The final form that the object will take based on what has come
     before. For example, if there is a max pack size, we may split an
     object from its delta base, in which case we have to throw away the
     delta. We don't know where those breaks will be until we walk
     through the whole list.

  4. If an object we attempt to reuse turns out to be corrupted, we
     fall back to the non-reuse code path, which will have a different
     size. So you'd need to actually check the reused object CRCs during
     the dry-run (and for local repacks, not transfers, we actually
     inflate and check the zlib, too, for safety).

So I think it's _possible_. But it's definitely not trivial. For now, I
think it makes sense to go with something like the patch I posted
earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a
regression in the bitmaps case. And it does not make it any harder for
somebody to later convert us to a true byte-counter (i.e., it is the
easy half already).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No progress from push when using bitmaps
  2014-03-14 15:29           ` Jeff King
@ 2014-03-14 23:53             ` Duy Nguyen
  0 siblings, 0 replies; 10+ messages in thread
From: Duy Nguyen @ 2014-03-14 23:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, Shawn Pearce, git

On Fri, Mar 14, 2014 at 10:29 PM, Jeff King <peff@peff.net> wrote:
>> If an object is reused, we already know its compressed size. If it's
>> not reused and is a loose object, we could use on-disk size. It's a
>> lot harder to estimate an not-reused, deltified object. All we have is
>> the uncompressed size, and the size of each delta in the delta chain..
>> Neither gives a good hint of what the compressed size would be.
>
> Hmm. I think we do have the compressed delta size after having run the
> compression phase (because that is ultimately what we compare to find
> the best delta).

There are cases when we try not to find deltas (large blobs, file too
small, or -delta attribute). The large blob case is especially
interesting because progress bar crawls slowly when they write these
objects.

> Loose objects are probably the hardest here, as we
> actually recompress them (IIRC, because packfiles encode the type/size
> info outside of the compressed bit, whereas it is inside for loose
> objects; the "experimental loose" format harmonized this, but it never
> caught on).
>
> Without doing that recompression, any value you came up with would be an
> estimate, though it would be pretty close (not off by more than a few
> bytes per object).

That's my hope. Although if they tweak compression level then the
estimation could be off (gzip -9 and gzip -1 produce big difference in
size)

> However, you can't just run through the packing list
> and add up the object sizes; you'd need to do a real "dry-run" through
> the writing phase. There are probably more I'm missing, but you need at
> least to figure out:
>
>   1. The actual compressed size of a full loose object, as described
>      above.
>
>   2. The variable-length headers for each object based on its type and
>      size.

We could run through a "typical" repo, calculate the average header
length then use it for all objects?

>
>   3. The final form that the object will take based on what has come
>      before. For example, if there is a max pack size, we may split an
>      object from its delta base, in which case we have to throw away the
>      delta. We don't know where those breaks will be until we walk
>      through the whole list.

Ah this could probably be avoided. max pack size does not apply to
streaming pack-objects, where progress bar is most shown. Falling back
to object number in this case does not sound too bad.

>
>   4. If an object we attempt to reuse turns out to be corrupted, we
>      fall back to the non-reuse code path, which will have a different
>      size. So you'd need to actually check the reused object CRCs during
>      the dry-run (and for local repacks, not transfers, we actually
>      inflate and check the zlib, too, for safety).

Ugh..

>
> So I think it's _possible_. But it's definitely not trivial. For now, I
> think it makes sense to go with something like the patch I posted
> earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a
> regression in the bitmaps case. And it does not make it any harder for
> somebody to later convert us to a true byte-counter (i.e., it is the
> easy half already).

Agreed.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-03-14 23:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-13  0:21 No progress from push when using bitmaps Shawn Pearce
2014-03-13 21:26 ` Jeff King
2014-03-13 22:01   ` Shawn Pearce
2014-03-13 22:07     ` Jeff King
2014-03-13 22:23       ` Junio C Hamano
2014-03-13 22:24       ` Jeff King
2014-03-14  9:43       ` Michael Haggerty
2014-03-14 10:21         ` Duy Nguyen
2014-03-14 15:29           ` Jeff King
2014-03-14 23:53             ` Duy Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.