git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Son Luong Ngoc <sluongng@gmail.com>, gitgitgadget@gmail.com
Cc: dstolee@microsoft.com, git@vger.kernel.org, jrnieder@google.com,
	peff@peff.net
Subject: Re: [PATCH 06/15] run-job: auto-size or use custom pack-files batch
Date: Thu, 30 Apr 2020 16:13:48 -0400	[thread overview]
Message-ID: <f7a193f8-8d5f-3f71-45b1-e117742df8f9@gmail.com> (raw)
In-Reply-To: <CAL3xRKcsa_P6X5Y+c2LWoftfjqEw9eheikrxfwXU=y6KuFHjtQ@mail.gmail.com>

On 4/30/2020 12:48 PM, Son Luong Ngoc wrote:
> Hi Derrick,
> 
> I have been reviewing these jobs' mechanics closely and have some questions:
> 
>> The dynamic default size is computed with this idea in mind for
>> a client repository that was cloned from a very large remote: there
>> is likely one "big" pack-file that was created at clone time. Thus,
>> do not try repacking it as it is likely packed efficiently by the
>> server. Instead, try packing the other pack-files into a single
>> pack-file.
>>
>> The size is then computed as follows:
>>
>> batch size = total size - max pack size
> 
> Could you please elaborate why is this the best value?

The intention was to repack everything _except_ the biggest pack,
but clearly that doesn't always work. There is some logic to "guess"
the size of the resulting pack that doesn't always reach the total
batch size, so nothing happens. More investigation is required here.

> In practice I have been testing this out with the following
> 
>> % cat debug.sh
>> #!/bin/bash
>>
>> temp=$(du -cb .git/objects/pack/*.pack)
>>
>> total_size=$(echo "$temp" | grep total | awk '{print $1}')
>> echo total_size
>> echo $total_size
>>
>> biggest_pack=$(echo "$temp" | sort -n | tail -2 | head -1 | awk '{print $1}')
>> echo biggest pack
>> echo $biggest_pack
>>
>> batch_size=$(expr $total_size - $biggest_pack)
>> echo batch size
>> echo $batch_size
> 
> If you were to run
> 
>> git multi-pack-index repack --batch-size=$(./debug.sh | tail -1)
> 
> then nothing would be repack.> 
> Instead, I have had a lot more success with the following
> 
>> # Get the 2nd biggest pack size (in bytes) + 1
>> $(du -b .git/objects/pack/*pack | sort -n | tail -2 | head -1 | awk '{print $1}') + 1
> 
> I think you also used this approach in t5319 when you used the 3rd
> biggest pack size

The "second biggest pack" is an interesting approach. At first glance it
seems like we will stabilize with one big pack and many similarly-sized
packs. However, even a small deviation in size is inevitable and will
cause two or more packs to combine and create a "new second biggest"
pack.

> Looking forward to a re-roll of this RFC.

I do plan to submit a new version of the RFC, but it will look quite
different based on the feedback so far. I'm still digesting that
feedback and will take another attempt at it after I wrap up some other
items that have my attention currently.

Thanks!
-Stolee



  reply	other threads:[~2020-04-30 20:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30 16:48 [PATCH 06/15] run-job: auto-size or use custom pack-files batch Son Luong Ngoc
2020-04-30 20:13 ` Derrick Stolee [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-04-03 20:47 [PATCH 00/15] [RFC] Maintenance jobs and job runner Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 06/15] run-job: auto-size or use custom pack-files batch Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7a193f8-8d5f-3f71-45b1-e117742df8f9@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).