All of lore.kernel.org
 help / color / mirror / Atom feed
* Migration to Git LFS inflates repository multiple times
@ 2018-11-11 23:47 Mateusz Loskot
  2018-11-12 12:30 ` Jeff King
  2018-11-13 20:24 ` Mateusz Loskot
  0 siblings, 2 replies; 5+ messages in thread
From: Mateusz Loskot @ 2018-11-11 23:47 UTC (permalink / raw)
  To: git

Hi,

I'm posting here for the first time and I hope it's the right place to ask
questions about Git LFS.

TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
and how to deal with it?

I'm migrating a big SVN repository to Git.
In SVN, a collection of third-party SDKs is maintained along with codebase.
Many of the third-party libraries come in binary form.
So, I'm migrating binary files of those to Git LFS.

I'm following the Git LFS tutorial,
section "Migrating existing repository data to LFS"
https://github.com/git-lfs/git-lfs/wiki/Tutorial

First, I run initial translation of the SVN reoi into Git..
The new repository is a Git bare repository.
There are 5 branches and 10+ tags in the proj.git repo.

It is quite large:

proj.git (BARE:master) $ du -sh
19G

Next, I performed the following sequence of steps to optimise it
and migrate to Git LFS:

1. Optimise the repo

proj.git (BARE:master) $ git gc
Enumerating objects: 1432599, done.
Counting objects: 100% (1432599/1432599), done.
Delta compression using up to 48 threads
Compressing objects: 100% (864524/864524), done.
Writing objects: 100% (1432599/1432599), done.
Total 1432599 (delta 541698), reused 1405922 (delta 525738)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 1432599, done.

proj.git (BARE:master) $ du -sh
11G

2. List the file types taking up the most space in the repo

proj.git (BARE:master) $ git lfs migrate info --everything
migrate: Sorting commits: ..., done
migrate: Examining commits: 100% (29412/29412), done
*.lib   27 GB       3524/3524 files(s)  100%
*.pdb   5.6 GB      1412/1412 files(s)  100%
*.cpp   4.8 GB  131848/131854 files(s)  100%
*.exe   2.3 GB        798/798 files(s)  100%
*.dll   2.0 GB      1000/1000 files(s)  100%

3. Migrate the repo to Git LFS

proj.git (BARE:master) $ git lfs migrate import
--include="*.exe,*.dll,*.lib,*.pdb,*.zip" --everything

4. Check size of the repo after migration to Git LFS

proj.git (BARE:master) $ du -sh
47G

5. Cleaning up the `.git` directory after migration to Git LFS

proj.git (BARE:master) $ git reflog expire --expire-unreachable=now --all

proj.git (BARE:master) $ git gc --prune=now --aggressive
Enumerating objects: 1462310, done.
Counting objects: 100% (1462310/1462310), done.
Delta compression using up to 48 threads
Compressing objects: 100% (1422322/1422322), done.
Writing objects: 100% (1462310/1462310), done.
Total 1462310 (delta 577640), reused 845097 (delta 0)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 1462310, done.

6. Check final disk size of the repo

proj.git (BARE:master) $ du -sh
39G

7. List the file types taking up the most space in the repository
after migration to Git LFS

proj.git (BARE:master) $ git lfs migrate info --everything
migrate: Sorting commits: ..., done
migrate: Examining commits: 100% (29412/29412), done
*.cpp   4.8 GB  131848/131854 files(s)  100%
*.png   1.1 GB  696499/696499 files(s)  100%
*.h     828 MB    86386/86471 files(s)  100%
*.csv   820 MB        939/939 files(s)  100%
*.html  686 MB    34126/34126 files(s)  100%


Now, I'm looking for anaswers to the following questions:

1. Is the procedure presented above correct to migrate (SVN ->) Git -> Git LFS?

2. Given the initial translation to Git generated 19 GB repo
(optimised to 11 GB)
is this normal Git LFS migration inflates the repository
to 47 GB (optimised ot 39 GB)?

3. Why the inflation happens? Is this a function of number of branches?
   How to understand the jump from 11 GB to 39 GB?

4. How to optimise the repository to cut the size down further?

My next step is to somehow push the fat pig into GitHub, Bitbucket or
Azure DevOps ;-)

I've used Git for a few years, but I'm pretty newbie regarding low-level
or administration tasks, so I might have made basic errors.
I'll be thankful for any feedback.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Migration to Git LFS inflates repository multiple times
  2018-11-11 23:47 Migration to Git LFS inflates repository multiple times Mateusz Loskot
@ 2018-11-12 12:30 ` Jeff King
  2018-11-12 12:42   ` Ævar Arnfjörð Bjarmason
  2018-11-13  0:39   ` Mateusz Loskot
  2018-11-13 20:24 ` Mateusz Loskot
  1 sibling, 2 replies; 5+ messages in thread
From: Jeff King @ 2018-11-12 12:30 UTC (permalink / raw)
  To: Mateusz Loskot; +Cc: git

On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote:

> Hi,
> 
> I'm posting here for the first time and I hope it's the right place to ask
> questions about Git LFS.
> 
> TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
> and how to deal with it?

That does sound odd to me. People with more LFS experience can probably
give you a better answers, but one thought occurred to me: does LFS
store backup copies of the original refs that it rewrites (similar to
the way filter-branch stores refs/original)?

If so, then the resulting repo has the new history _and_ the old
history. Which might mean storing those large blobs both as Git objects
(for the old history) and in an LFS cache directory (for the new
history).

And the right next step is probably to delete those backup refs, and
then "git gc --prune=now". Hmm, actually thinking about it, reflogs
could be making the old history reachable, too.

Try looking at the output of "git for-each-ref" and seeing if there are
any backup refs. After deleting them (or confirming that there aren't),
prune the reflogs with:

  git reflog expire --expire-unreachable=now --all

and then "git gc --prune=now".

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Migration to Git LFS inflates repository multiple times
  2018-11-12 12:30 ` Jeff King
@ 2018-11-12 12:42   ` Ævar Arnfjörð Bjarmason
  2018-11-13  0:39   ` Mateusz Loskot
  1 sibling, 0 replies; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 12:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Mateusz Loskot, git


On Mon, Nov 12 2018, Jeff King wrote:

> On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote:
>
>> Hi,
>>
>> I'm posting here for the first time and I hope it's the right place to ask
>> questions about Git LFS.
>>
>> TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
>> and how to deal with it?
>
> That does sound odd to me. People with more LFS experience can probably
> give you a better answers, but one thought occurred to me: does LFS
> store backup copies of the original refs that it rewrites (similar to
> the way filter-branch stores refs/original)?
>
> If so, then the resulting repo has the new history _and_ the old
> history. Which might mean storing those large blobs both as Git objects
> (for the old history) and in an LFS cache directory (for the new
> history).
>
> And the right next step is probably to delete those backup refs, and
> then "git gc --prune=now". Hmm, actually thinking about it, reflogs
> could be making the old history reachable, too.
>
> Try looking at the output of "git for-each-ref" and seeing if there are
> any backup refs. After deleting them (or confirming that there aren't),
> prune the reflogs with:
>
>   git reflog expire --expire-unreachable=now --all
>
> and then "git gc --prune=now".

Even if it's only the most recent version of each file this could also
be explained by LFS storing each file inflated as-is on disk, whereas
git will store them delta-compressed.

According to the initial E-Mail "*.exe,*.dll,*.lib,*.pdb,*.zip" was
added to LFS. Depending on the content of those they might be delta
compressing somewhat better than random data.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Migration to Git LFS inflates repository multiple times
  2018-11-12 12:30 ` Jeff King
  2018-11-12 12:42   ` Ævar Arnfjörð Bjarmason
@ 2018-11-13  0:39   ` Mateusz Loskot
  1 sibling, 0 replies; 5+ messages in thread
From: Mateusz Loskot @ 2018-11-13  0:39 UTC (permalink / raw)
  To: git

On Mon, 12 Nov 2018 at 13:31, Jeff King <peff@peff.net> wrote:
> On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote:
> >
> > TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
> > and how to deal with it?
>
> That does sound odd to me. People with more LFS experience can probably
> give you a better answers

FYI, I forwarded my question to https://github.com/git-lfs/git-lfs/issues/3374

> but one thought occurred to me: does LFS
> store backup copies of the original refs that it rewrites (similar to
> the way filter-branch stores refs/original)?

I don't think I see any backup refs (see below for full list).
But, I may be misunderstanding what they are, how to look for them.

> history. Which might mean storing those large blobs both as Git objects
> (for the old history) and in an LFS cache directory (for the new
> history).

Yes, it makes sense.

> And the right next step is probably to delete those backup refs, and
> then "git gc --prune=now". Hmm, actually thinking about it, reflogs
> could be making the old history reachable, too.
>
> Try looking at the output of "git for-each-ref" and seeing if there are
> any backup refs.

I see. Here is the list (long!) of all I found:

proj.git (BARE:master) $ git for-each-ref
c718eadcf8d09d68c385f0a9355a2c871474fb43 commit refs/heads/1.0
daa75889053b70515179e334cbe3fe6fc7873ff3 commit refs/heads/1.1
cb70db292c1f0c62170d05ffa8dad3c87a6f8ebd commit refs/heads/2.0
f1597e80fcea16bec96dc43f7ab706616126305b commit refs/heads/3.0
1d9e4813ae2fdc5c2b52f7115facda9059b009dc commit refs/heads/master
f41edf37e9a4120bc5d5d66b29d110d403b8db9b commit
refs/svn/attic/tags/1.0.1006/6674
850166b21f27447c6b503bb753c454ccedcea8ef commit refs/svn/attic/tags/1.0.216/1291
8a24407f2df0ea7a401fbc08b387387538912642 commit refs/svn/attic/tags/1.0.252/1543
771d81b0756d6ff7d73779ed79f49a607bffb80e commit refs/svn/attic/tags/1.0.299/1883
10925d0fe0de090d4de109fd6403b86d014d6a21 commit refs/svn/attic/tags/1.0.342/2288
ca8dc7b243d002ac6b27f219a6172d36e7885ac1 commit refs/svn/attic/tags/1.0.391/2470
79ebabed25d31dfa34ad68e58fa9327f71928df1 commit refs/svn/attic/tags/1.0.433/2657
d3ed45804843c5aa153810b7494e2c7c0b842c82 commit refs/svn/attic/tags/1.0.450/2724
088af5dbb225cb8dfbeafb5b63158234f4e4017d commit refs/svn/attic/tags/1.0.502/2967
5da8598ed98a5a6610108d67849955da14f9d5b8 commit refs/svn/attic/tags/1.0.546/3212
c3463397337f9f6e5f9d8e64cd79c013fd798bc8 commit refs/svn/attic/tags/1.0.615/3470
673b2f93fda830cc8f28d436abead5fd54baa361 commit refs/svn/attic/tags/1.0.657/3704
247cf24b90afd39f4f5dd7a27cf6b74483215285 commit refs/svn/attic/tags/1.0.662/3725
e2a1609bb6b15ee767565ca0ff152eae3f72a76b commit refs/svn/attic/tags/1.0.673/3820
48033fb9046b1ee60ad9b73073e9185c31ee4568 commit refs/svn/attic/tags/1.0.742/4325
d7b566a275209d0aebba39c7a4028c9dcfb8a468 commit
refs/svn/attic/tags/1.1.1141/13525
80f922becfc406420d2f14543e6da684f7377504 commit
refs/svn/attic/tags/1.1.1535/16534
252481191cacdef0e77eb6ec02c98b07fca7bc77 commit
refs/svn/attic/tags/1.1.1582/16435
601f072d559b664c101d89f3445ed3d00f4ef5dd commit
refs/svn/attic/tags/1.1.939/12077
417fb23d71cab30e2c6218faed6f86021b67ca25 commit
refs/svn/attic/tags/2.0.1156/21143
5539fbbe0078b782af02d46e0c1abc86ce3d5902 commit refs/svn/map
c718eadcf8d09d68c385f0a9355a2c871474fb43 commit refs/svn/root/branches/1.0
daa75889053b70515179e334cbe3fe6fc7873ff3 commit refs/svn/root/branches/1.1
cb70db292c1f0c62170d05ffa8dad3c87a6f8ebd commit refs/svn/root/branches/2.0
f1597e80fcea16bec96dc43f7ab706616126305b commit refs/svn/root/branches/3.0
e946e96ce2b37a771769196027ae87b8f24181e0 commit refs/svn/root/tags/1.0.1058
06928f42664384bd5e24c115f9c23acc2fd949da commit refs/svn/root/tags/1.0.1240
9f059337974aa195386c5f3ee21957551624aa27 commit refs/svn/root/tags/1.0.1653
db98d68f93d2e9127ab766d3bbe6c933ec169d29 commit refs/svn/root/tags/1.0.764
20ad69c6b94bc8b73b613b5c21d367f22f423501 commit refs/svn/root/tags/1.1.1163
458be535d7f0bc512e800759759e86a211a418b6 commit refs/svn/root/tags/1.1.1290
df045ab97bee94e8cfe72b70802b837719899587 commit refs/svn/root/tags/1.1.1556
cd8ce83868016a0854c0f3cf1b23ea68a32674a2 commit refs/svn/root/tags/1.1.1706
fcd0801b93f48bd46b276ccb82678a70f11fc3ca commit refs/svn/root/tags/1.1.1809
5df0902cfe973b0a041409ec2e8d2314f2b8031e commit refs/svn/root/tags/1.1.2368
c9a06b4f43bab77ed283fe2736ab5c865e03026e commit refs/svn/root/tags/1.1.2417
32e505f8c4deadb73c63bd20a598481cd164541d commit refs/svn/root/tags/1.1.947
ef9c3667ec419bcb6d5eb5b9dbacb1cff0b1051e commit refs/svn/root/tags/2.0.1187
a1a6a5bedb8949eb91f3509929edb9efa9ad2875 commit refs/svn/root/tags/2.0.1198
33a8f49da311caecdb5521759251bbcb78e3bff2 commit refs/svn/root/tags/2.0.1338
63e59278131281858296b56f4ef5dd91c332941a commit refs/svn/root/tags/2.0.1481
d23a1c662f772a7fc0d23a07794b57cfd9eff064 commit refs/svn/root/tags/2.0.1835
c53e2cc4660a9e3121dff33c28c1383766fda39b commit refs/svn/root/tags/2.0.2148
c7e0293ec09fee809fba707054cd1fd8fe492664 commit refs/svn/root/tags/2.0.2580
1d9e4813ae2fdc5c2b52f7115facda9059b009dc commit refs/svn/root/trunk
e946e96ce2b37a771769196027ae87b8f24181e0 commit refs/tags/1.0.1058
06928f42664384bd5e24c115f9c23acc2fd949da commit refs/tags/1.0.1240
9f059337974aa195386c5f3ee21957551624aa27 commit refs/tags/1.0.1653
db98d68f93d2e9127ab766d3bbe6c933ec169d29 commit refs/tags/1.0.764
20ad69c6b94bc8b73b613b5c21d367f22f423501 commit refs/tags/1.1.1163
458be535d7f0bc512e800759759e86a211a418b6 commit refs/tags/1.1.1290
df045ab97bee94e8cfe72b70802b837719899587 commit refs/tags/1.1.1556
cd8ce83868016a0854c0f3cf1b23ea68a32674a2 commit refs/tags/1.1.1706
fcd0801b93f48bd46b276ccb82678a70f11fc3ca commit refs/tags/1.1.1809
5df0902cfe973b0a041409ec2e8d2314f2b8031e commit refs/tags/1.1.2368
c9a06b4f43bab77ed283fe2736ab5c865e03026e commit refs/tags/1.1.2417
32e505f8c4deadb73c63bd20a598481cd164541d commit refs/tags/1.1.947
ef9c3667ec419bcb6d5eb5b9dbacb1cff0b1051e commit refs/tags/2.0.1187
a1a6a5bedb8949eb91f3509929edb9efa9ad2875 commit refs/tags/2.0.1198
33a8f49da311caecdb5521759251bbcb78e3bff2 commit refs/tags/2.0.1338
63e59278131281858296b56f4ef5dd91c332941a commit refs/tags/2.0.1481
d23a1c662f772a7fc0d23a07794b57cfd9eff064 commit refs/tags/2.0.1835
c53e2cc4660a9e3121dff33c28c1383766fda39b commit refs/tags/2.0.2148
c7e0293ec09fee809fba707054cd1fd8fe492664 commit refs/tags/2.0.2580

AFAIS, there are only ones related to SVN to Git translation, and Git natives.

> After deleting them (or confirming that there aren't),
> prune the reflogs with:
>
>   git reflog expire --expire-unreachable=now --all
>
> and then "git gc --prune=now".

Although there seem to be no backup refs in my repo, I decided to get
rid of all refs/svn/*

proj.git (BARE:master) $ du -sh
39G     .

proj.git (BARE:master) $ git for-each-ref --format="%(refname)"
refs/svn/ | xargs -n 1 git update-ref -d

proj.git (BARE:master) $ git for-each-ref
c718eadcf8d09d68c385f0a9355a2c871474fb43 commit refs/heads/1.0
daa75889053b70515179e334cbe3fe6fc7873ff3 commit refs/heads/1.1
cb70db292c1f0c62170d05ffa8dad3c87a6f8ebd commit refs/heads/2.0
f1597e80fcea16bec96dc43f7ab706616126305b commit refs/heads/3.0
1d9e4813ae2fdc5c2b52f7115facda9059b009dc commit refs/heads/master
e946e96ce2b37a771769196027ae87b8f24181e0 commit refs/tags/1.0.1058
06928f42664384bd5e24c115f9c23acc2fd949da commit refs/tags/1.0.1240
9f059337974aa195386c5f3ee21957551624aa27 commit refs/tags/1.0.1653
db98d68f93d2e9127ab766d3bbe6c933ec169d29 commit refs/tags/1.0.764
20ad69c6b94bc8b73b613b5c21d367f22f423501 commit refs/tags/1.1.1163
458be535d7f0bc512e800759759e86a211a418b6 commit refs/tags/1.1.1290
df045ab97bee94e8cfe72b70802b837719899587 commit refs/tags/1.1.1556
cd8ce83868016a0854c0f3cf1b23ea68a32674a2 commit refs/tags/1.1.1706
fcd0801b93f48bd46b276ccb82678a70f11fc3ca commit refs/tags/1.1.1809
5df0902cfe973b0a041409ec2e8d2314f2b8031e commit refs/tags/1.1.2368
c9a06b4f43bab77ed283fe2736ab5c865e03026e commit refs/tags/1.1.2417
32e505f8c4deadb73c63bd20a598481cd164541d commit refs/tags/1.1.947
ef9c3667ec419bcb6d5eb5b9dbacb1cff0b1051e commit refs/tags/2.0.1187
a1a6a5bedb8949eb91f3509929edb9efa9ad2875 commit refs/tags/2.0.1198
33a8f49da311caecdb5521759251bbcb78e3bff2 commit refs/tags/2.0.1338
63e59278131281858296b56f4ef5dd91c332941a commit refs/tags/2.0.1481
d23a1c662f772a7fc0d23a07794b57cfd9eff064 commit refs/tags/2.0.1835
c53e2cc4660a9e3121dff33c28c1383766fda39b commit refs/tags/2.0.2148
c7e0293ec09fee809fba707054cd1fd8fe492664 commit refs/tags/2.0.2580

proj.git (BARE:master) $ git reflog expire --expire-unreachable=now --all

proj.git (BARE:master) $ git gc --prune=now
Enumerating objects: 1315030, done.
Counting objects: 100% (1315030/1315030), done.
Delta compression using up to 48 threads
Compressing objects: 100% (809937/809937), done.
Writing objects: 100% (1315030/1315030), done.
Total 1315030 (delta 495565), reused 1314739 (delta 495312)
Checking connectivity: 1315030, done.

proj.git (BARE:master)
$ du -sh
38G     .

The size decrease, but not much.

Thanks for the ideas though.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Migration to Git LFS inflates repository multiple times
  2018-11-11 23:47 Migration to Git LFS inflates repository multiple times Mateusz Loskot
  2018-11-12 12:30 ` Jeff King
@ 2018-11-13 20:24 ` Mateusz Loskot
  1 sibling, 0 replies; 5+ messages in thread
From: Mateusz Loskot @ 2018-11-13 20:24 UTC (permalink / raw)
  To: git

On Mon, 12 Nov 2018 at 00:47, Mateusz Loskot <mateusz@loskot.net> wrote:
>
> Hi,
>
> I'm posting here for the first time and I hope it's the right place to ask
> questions about Git LFS.
>
> TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
> and how to deal with it?

FYI, answers to my questions have been completed via GitHub
https://github.com/git-lfs/git-lfs/issues/3374

I'd like to thank Jeff and Ævar here for help too.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-13 20:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-11 23:47 Migration to Git LFS inflates repository multiple times Mateusz Loskot
2018-11-12 12:30 ` Jeff King
2018-11-12 12:42   ` Ævar Arnfjörð Bjarmason
2018-11-13  0:39   ` Mateusz Loskot
2018-11-13 20:24 ` Mateusz Loskot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.