* There should have be git gc --repack-arguments @ 2021-04-07 12:10 Bagas Sanjaya 2021-04-07 19:37 ` Jeff King 2021-04-07 19:38 ` Bryan Turner 0 siblings, 2 replies; 10+ messages in thread From: Bagas Sanjaya @ 2021-04-07 12:10 UTC (permalink / raw) To: git Hi, I request that git gc should have --repack-arguments option. The value of this option should be passed to git repack. The use case is when I have very large repos (such as GCC and Linux kernel) on a server with small RAM (1-2 GB). When doing gc on such repo, the repack step may hang because git-repack have to create single large packfile which can be larger than available memory (RAM+swap), so it must be necessary to do git repack --window-memory=<desired memory usage> --max-pack-size=<desired pack size> to create split and smaller packs instead. There should also git config item gc.repackArguments, which have the same effect as git gc --repack-arguments, with the option takes precedence over the config. -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 12:10 There should have be git gc --repack-arguments Bagas Sanjaya @ 2021-04-07 19:37 ` Jeff King 2021-04-07 20:40 ` Junio C Hamano 2021-04-07 19:38 ` Bryan Turner 1 sibling, 1 reply; 10+ messages in thread From: Jeff King @ 2021-04-07 19:37 UTC (permalink / raw) To: Bagas Sanjaya; +Cc: git On Wed, Apr 07, 2021 at 07:10:43PM +0700, Bagas Sanjaya wrote: > I request that git gc should have --repack-arguments option. The value > of this option should be passed to git repack. I think in general we prefer to make individual options configurable, rather than having a blanket "pass along these options" argument, for two reasons: - some options may cause the sub-program to behave unexpectedly. E.g., if you put "-a" in the repack-arguments, that may be subverting git-gc's assumptions about how repack will behave - arguments are a list, not a string. So you have to provide some mechanism for splitting them (presumably on whitespace, but what if we need quoting)? > The use case is when I have very large repos (such as GCC and Linux kernel) > on a server with small RAM (1-2 GB). When doing gc on such repo, the repack > step may hang because git-repack have to create single large packfile which > can be larger than available memory (RAM+swap), so it must be necessary to > do git repack --window-memory=<desired memory usage> --max-pack-size=<desired > pack size> to create split and smaller packs instead. > > There should also git config item gc.repackArguments, which have the same > effect as git gc --repack-arguments, with the option takes precedence over > the config. You can set pack.windowMemory in your config already, to solve the first part. You can also set pack.packSizeLimit for the latter, though I do not recommend it. It will not help with memory usage (neither while repacking nor for later commands). We do mmap() the resulting packfiles, but we rely on the operating system to manage the actual in-RAM working set (but that is also true with multiple packfiles; we are happy to map several of them at once). And it may make your on-disk size much larger. We don't allow deltas between on-disk packs, which means some objects which could be stored as deltas won't be. That in turn hurts on a memory-starved system because we'll need more block cache to perform the same task. It also results in extra CPU when serving fetches or pushing, since we'll try to find new deltas between the packs on the fly. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 19:37 ` Jeff King @ 2021-04-07 20:40 ` Junio C Hamano 2021-04-07 21:37 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2021-04-07 20:40 UTC (permalink / raw) To: Jeff King; +Cc: Bagas Sanjaya, git Jeff King <peff@peff.net> writes: >> ... git repack ... --max-pack-size=<desired pack size> to create split and >> smaller packs instead. > ... > You can also set pack.packSizeLimit for the latter, though I do not > recommend it. It will not help with memory usage (neither while > repacking nor for later commands). In other words, passing --max-pack-size, whether it is done with a new --repack-arguments option or it is done with the existing pack.packSizeLimit configuration, would make things worse. So in conclusion: - attempting to repack everything into one pack on a memory starved box would be helped with reduced window memory size. - on a small box, it may make sense to avoid repacking everything into one in the first place, but we do not want the number of packs to grow unbounded. Would the new geometric repack feature help here, especially for the latter? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 20:40 ` Junio C Hamano @ 2021-04-07 21:37 ` Jeff King 2021-04-07 22:13 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2021-04-07 21:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Taylor Blau, Bagas Sanjaya, git On Wed, Apr 07, 2021 at 01:40:16PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> ... git repack ... --max-pack-size=<desired pack size> to create split and > >> smaller packs instead. > > ... > > You can also set pack.packSizeLimit for the latter, though I do not > > recommend it. It will not help with memory usage (neither while > > repacking nor for later commands). > > In other words, passing --max-pack-size, whether it is done with a > new --repack-arguments option or it is done with the existing > pack.packSizeLimit configuration, would make things worse. Right. I wish we didn't have --max-pack-size at all. I do not think it is ever a good idea, and it complicates the packing code quite a bit. These days we have index v2 to let us address more than 4GB in a packfile. I suppose it's possible you could have a filesystem whose max file size is smaller than your total packfile, but that seems pretty unlikely these days (even 32-bit systems tend to have large file support). But that's all a tangent. :) > So in conclusion: > > - attempting to repack everything into one pack on a memory starved > box would be helped with reduced window memory size. Yes, though less than you might think. It is only trying to keep the memory used by delta compression at bay. The per-object book-keeping tends to be quite high by itself. If you are under memory pressure during delta compression, you may also be better off reducing the number of threads (since each thread is simultaneously using windowMemory bytes). > - on a small box, it may make sense to avoid repacking everything > into one in the first place, but we do not want the number of > packs to grow unbounded. > > Would the new geometric repack feature help here, especially for the > latter? Yes, I think it would. You'd perhaps want to generate a multi-pack-index file, too, to avoid having to look for objects in multiple packs sequentially (we have a "git repack --write-midx" option on the way, as well). -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 21:37 ` Jeff King @ 2021-04-07 22:13 ` Junio C Hamano 2021-04-07 22:22 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2021-04-07 22:13 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, Bagas Sanjaya, git Jeff King <peff@peff.net> writes: > On Wed, Apr 07, 2021 at 01:40:16PM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> >> ... git repack ... --max-pack-size=<desired pack size> to create split and >> >> smaller packs instead. >> > ... >> > You can also set pack.packSizeLimit for the latter, though I do not >> > recommend it. It will not help with memory usage (neither while >> > repacking nor for later commands). >> >> In other words, passing --max-pack-size, whether it is done with a >> new --repack-arguments option or it is done with the existing >> pack.packSizeLimit configuration, would make things worse. > > Right. I wish we didn't have --max-pack-size at all. I do not think it > is ever a good idea, and it complicates the packing code quite a bit. I suspect that the original motivation was sneaker-netting on multiple floppy disks ;-) >> - on a small box, it may make sense to avoid repacking everything >> into one in the first place, but we do not want the number of >> packs to grow unbounded. >> >> Would the new geometric repack feature help here, especially for the >> latter? > > Yes, I think it would. You'd perhaps want to generate a multi-pack-index > file, too, to avoid having to look for objects in multiple packs > sequentially (we have a "git repack --write-midx" option on the way, as > well). Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 22:13 ` Junio C Hamano @ 2021-04-07 22:22 ` Jeff King 2021-04-09 9:58 ` Bagas Sanjaya 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2021-04-07 22:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: Taylor Blau, Bagas Sanjaya, git On Wed, Apr 07, 2021 at 03:13:39PM -0700, Junio C Hamano wrote: > >> > You can also set pack.packSizeLimit for the latter, though I do not > >> > recommend it. It will not help with memory usage (neither while > >> > repacking nor for later commands). > >> > >> In other words, passing --max-pack-size, whether it is done with a > >> new --repack-arguments option or it is done with the existing > >> pack.packSizeLimit configuration, would make things worse. > > > > Right. I wish we didn't have --max-pack-size at all. I do not think it > > is ever a good idea, and it complicates the packing code quite a bit. > > I suspect that the original motivation was sneaker-netting on > multiple floppy disks ;-) That had always been my impression, too. But when I looked in the archive while writing my earlier reply, most of the discussion near --max-pack-size had to do with the early index limitations. If you are sneaker-netting, you are probably better off to just split the pack at byte boundaries with an external tool anyway, for two reasons: - our max-pack-size is just a guideline. It only splits at object boundaries so if you have an object bigger than the max, we'll exceed it. - dedicated splitting tools often have useful extra features, like k-of-n error correction. Besides, if you are sneaker netting you'd want to use a bundle, and I don't think bundles support max-pack-size. :) Anyway, all off-topic but an interesting diversion. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 22:22 ` Jeff King @ 2021-04-09 9:58 ` Bagas Sanjaya 2021-04-09 15:49 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Bagas Sanjaya @ 2021-04-09 9:58 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: Taylor Blau, git On 08/04/21 05.22, Jeff King wrote: > If you are sneaker-netting, you are probably better off to just split > the pack at byte boundaries with an external tool anyway, for two > reasons: > > - our max-pack-size is just a guideline. It only splits at object > boundaries so if you have an object bigger than the max, we'll > exceed it. > > - dedicated splitting tools often have useful extra features, like > k-of-n error correction. > What external tools are for splitting packs? Can splitted packs by such tools still be usable by Git? -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-09 9:58 ` Bagas Sanjaya @ 2021-04-09 15:49 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2021-04-09 15:49 UTC (permalink / raw) To: Bagas Sanjaya; +Cc: Junio C Hamano, Taylor Blau, git On Fri, Apr 09, 2021 at 04:58:32PM +0700, Bagas Sanjaya wrote: > On 08/04/21 05.22, Jeff King wrote: > > If you are sneaker-netting, you are probably better off to just split > > the pack at byte boundaries with an external tool anyway, for two > > reasons: > > > > - our max-pack-size is just a guideline. It only splits at object > > boundaries so if you have an object bigger than the max, we'll > > exceed it. > > > > - dedicated splitting tools often have useful extra features, like > > k-of-n error correction. > > > What external tools are for splitting packs? Can splitted packs > by such tools still be usable by Git? No, but you can reassemble the parts at the destination before feeding them to Git. On a system with normal posix tools, you can split like: git pack-objects --stdout --all </dev/null | split -b 1m - split-pack- and then after transferring split-pack-* (which are individual 1 megabyte files) to the destination, you can do: cat split-pack-* | git index-pack -v --stdin (There's no error correction in split; tools like rar will do that, and probably others, but it has been ages since I've had to split a file to meet transfer requirements). -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 12:10 There should have be git gc --repack-arguments Bagas Sanjaya 2021-04-07 19:37 ` Jeff King @ 2021-04-07 19:38 ` Bryan Turner 2021-04-08 13:31 ` Bagas Sanjaya 1 sibling, 1 reply; 10+ messages in thread From: Bryan Turner @ 2021-04-07 19:38 UTC (permalink / raw) To: Bagas Sanjaya; +Cc: Git Users On Wed, Apr 7, 2021 at 5:10 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote: > > Hi, > > I request that git gc should have --repack-arguments option. The value > of this option should be passed to git repack. > > The use case is when I have very large repos (such as GCC and Linux kernel) > on a server with small RAM (1-2 GB). When doing gc on such repo, the repack > step may hang because git-repack have to create single large packfile which > can be larger than available memory (RAM+swap), so it must be necessary to > do git repack --window-memory=<desired memory usage> --max-pack-size=<desired > pack size> to create split and smaller packs instead. I can't speak to the feature request, but since there are configuration knobs already for both of those, that implies you can use git -c pack.windowMemory=... -c pack.packSizeLimit=... gc and those configuration settings will be propagated to the git repack process that git gc runs. > > There should also git config item gc.repackArguments, which have the same > effect as git gc --repack-arguments, with the option takes precedence over > the config. Passing configuration settings as I show above would already take precedence over any config file, since config from the command line is higher priority. Hope this helps! Bryan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: There should have be git gc --repack-arguments 2021-04-07 19:38 ` Bryan Turner @ 2021-04-08 13:31 ` Bagas Sanjaya 0 siblings, 0 replies; 10+ messages in thread From: Bagas Sanjaya @ 2021-04-08 13:31 UTC (permalink / raw) To: Bryan Turner; +Cc: Git Users On 08/04/21 02.38, Bryan Turner wrote: > I can't speak to the feature request, but since there are > configuration knobs already for both of those, that implies you can > use git -c pack.windowMemory=... -c pack.packSizeLimit=... gc and > those configuration settings will be propagated to the git repack > process that git gc runs. Oops, I overlooked that. Thanks for reminding me! -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-04-09 15:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-07 12:10 There should have be git gc --repack-arguments Bagas Sanjaya 2021-04-07 19:37 ` Jeff King 2021-04-07 20:40 ` Junio C Hamano 2021-04-07 21:37 ` Jeff King 2021-04-07 22:13 ` Junio C Hamano 2021-04-07 22:22 ` Jeff King 2021-04-09 9:58 ` Bagas Sanjaya 2021-04-09 15:49 ` Jeff King 2021-04-07 19:38 ` Bryan Turner 2021-04-08 13:31 ` Bagas Sanjaya
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).