* Partial Clone garbage collection? [not found] <CA+M_GG35V3yNCfQ247PSrpP-R_f8bWNcBcmrnTWbrn1Nap_A4A@mail.gmail.com> @ 2019-10-30 17:08 ` Simon Holmberg 2019-10-30 18:17 ` без имени 2019-10-30 20:37 ` Jeff King 0 siblings, 2 replies; 5+ messages in thread From: Simon Holmberg @ 2019-10-30 17:08 UTC (permalink / raw) To: git I've been experimenting with the new Partial Clone feature, attempting to use it to filter out the otherwise full history of the large binary resources in our repos. It works really well on the initial clone. But once you start jumping around in history a lot, the repo will grow out of proportion again as promised pack files are fetched. Are there any plans to add a --filter parameter to git gc as well, that would be able to prune past history of objects and convert them back into pack promises? Or am I wrong in assuming that this could ever act as a native replacement for LFS? Without this, a repo would only continue to grow ad infinitum, resulting in the same issues as before unless you actively chose to delete your entire clone and re-clone it from upstream once in a while. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Partial Clone garbage collection? 2019-10-30 17:08 ` Partial Clone garbage collection? Simon Holmberg @ 2019-10-30 18:17 ` без имени 2019-10-30 20:38 ` Jeff King 2019-10-30 20:37 ` Jeff King 1 sibling, 1 reply; 5+ messages in thread From: без имени @ 2019-10-30 18:17 UTC (permalink / raw) To: Simon Holmberg, git > Are there any plans to add a --filter parameter to git gc as well, > that would be able to prune past history of objects and convert them > back into pack promises? This operation will change the hash, and hence the history. I tried to draw attention to a specific trick: https://public-inbox.org/git/25196441572039199@iva5-58d151f416d2.qloud-c.yandex.net/T/#t but he was not supported. -- zvezdochiot ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Partial Clone garbage collection? 2019-10-30 18:17 ` без имени @ 2019-10-30 20:38 ` Jeff King 0 siblings, 0 replies; 5+ messages in thread From: Jeff King @ 2019-10-30 20:38 UTC (permalink / raw) To: без имени Cc: Simon Holmberg, git On Wed, Oct 30, 2019 at 09:17:02PM +0300, без имени wrote: > > Are there any plans to add a --filter parameter to git gc as well, > > that would be able to prune past history of objects and convert them > > back into pack promises? > > This operation will change the hash, and hence the history. I tried to > draw attention to a specific trick: I don't think this is quite the same thing. Simon is just talking about the partial-clone system. The files are still mentioned in history, but the client does not store the matching blobs themselves. Instead, it can fetch them on-demand from a remote repository. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Partial Clone garbage collection? 2019-10-30 17:08 ` Partial Clone garbage collection? Simon Holmberg 2019-10-30 18:17 ` без имени @ 2019-10-30 20:37 ` Jeff King 2019-10-30 20:45 ` Jonathan Tan 1 sibling, 1 reply; 5+ messages in thread From: Jeff King @ 2019-10-30 20:37 UTC (permalink / raw) To: Simon Holmberg; +Cc: git On Wed, Oct 30, 2019 at 06:08:18PM +0100, Simon Holmberg wrote: > I've been experimenting with the new Partial Clone feature, attempting > to use it to filter out the otherwise full history of the large binary > resources in our repos. It works really well on the initial clone. But > once you start jumping around in history a lot, the repo will grow out > of proportion again as promised pack files are fetched. > > Are there any plans to add a --filter parameter to git gc as well, > that would be able to prune past history of objects and convert them > back into pack promises? Or am I wrong in assuming that this could > ever act as a native replacement for LFS? Without this, a repo would > only continue to grow ad infinitum, resulting in the same issues as > before unless you actively chose to delete your entire clone and > re-clone it from upstream once in a while. I don't recall seeing anybody actively working on this, but I think it would be a good idea. You'd probably want to be able to specify it in your config somehow, so that subsequent repacks pruned as necessary without you having to remember to do it each time. You could naively just drop everything that matches the filter, and then re-fetch it as needed. But for efficiency, you may want to keep some other objects: - objects mentioned directly in the index, or the tree of HEAD; you'd end up re-fetching these next time you "git checkout" - perhaps objects fetched recently are more worth keeping (e.g., ones with an mtime less than a day or two). I don't know if that helps, though. What you really care about is how recently they were accessed (assuming there's some locality there), not written. A frequently-accessed object may have been fetched immediately after you cloned, giving it an old mtime. Since we can get any of the objects again if we want and we're just optimizing, this is really just a cache-expiration problem. But it may be hard to implement any of the stock algorithms without having logs of which objects were accessed. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Partial Clone garbage collection? 2019-10-30 20:37 ` Jeff King @ 2019-10-30 20:45 ` Jonathan Tan 0 siblings, 0 replies; 5+ messages in thread From: Jonathan Tan @ 2019-10-30 20:45 UTC (permalink / raw) To: peff; +Cc: simon.holmberg, git, Jonathan Tan > I don't recall seeing anybody actively working on this, but I think it > would be a good idea. I agree with this. > You could naively just drop everything that matches the filter, and then > re-fetch it as needed. We should also retain the promisor objects that are not referenced by any other promisor object, regardless of whether it matches the filter. (If not, the resulting repository will not pass fsck.) The place to make this change is most likely in repack_promisor_objects() in builtin/repack.c. Currently, it just repacks all known promisor objects into one - modifying it to only repack the ones we want (and adding a CLI option etc.) should be sufficient. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-30 20:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CA+M_GG35V3yNCfQ247PSrpP-R_f8bWNcBcmrnTWbrn1Nap_A4A@mail.gmail.com> 2019-10-30 17:08 ` Partial Clone garbage collection? Simon Holmberg 2019-10-30 18:17 ` без имени 2019-10-30 20:38 ` Jeff King 2019-10-30 20:37 ` Jeff King 2019-10-30 20:45 ` Jonathan Tan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).