git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Kalz <andreas-kalz@gmx.de>
To: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Cc: Philip Oakley <philipoakley@iee.email>, git@vger.kernel.org
Subject: Re: Git as data archive
Date: Mon, 9 Dec 2019 17:39:10 +0100	[thread overview]
Message-ID: <fc38fbbe-e481-6a36-162e-34465f0a5bb0@gmx.de> (raw)
In-Reply-To: <d4e34fa6-e92c-5178-c61e-e4b87dab7e09@virtuell-zuhause.de>

Hi Thomas,
I committed it only on a local repository (git add . / git commit
-m"..."). But I never tested to restore the big files from the archive
:( and then I stepped over the bug description.

Now I tried it out and something bad happened:
E:\bilder_git>git checkout -- Hochzeitsmesse.mp4
error: bad object header
fatal: packed object 5c1403a85829c1c9e03bf04ac814d65bb72b617f (stored in
.git/objects/pack/pack-00246783dc8e6b7365220e75563b5cecfa358e11.pack) is
corrupt

During add / commit there was no problem, but now this is not a good
thing...

My C-Skills are not bad - I worked about 10 years in embedded SW
development. But, currently my time is limited as I have a 3 week old
baby child :)

All the best,
Andreas



Am 09.12.2019 um 02:18 schrieb Thomas Braun:
> On 08.12.2019 19:44, Andreas Kalz wrote:
>
> Hi Andreas,
>
>> @Thomas: are you Thomas Braun who studied at FH Regensburg?
> nope, sorry.
>
>> Well, currently the .git repository is 715GB and the maximum file size
>> is 9.5GB, but I did not get error messages due to that even if the
>> performance is quite low. The biggest pack* file is 24GB. There are some
>> files which are modified, but most are not modified.
> Okay that is kind-of-large. How did you add the 9.5GB file? AFAIK this
> could not have be done on windows.
>
> Do you push that to a remote repository as well?
>
>> My question came up as I did not find a documentation about limits of
>> git, only a lot of entries about github and forum users who are
>> discussing about old bugs of git. I read about git-lfs and also that it
>> is not working very stable, due to that I did not use it yet.
> Although I'm not using git-lfs myself, from what I know it works well.
> But it does have the same limitation as stock git for windows as Philip
> pointed out already.
>
>> How can the delta compression settings and/or the big filethreshold
>> limits be modified?
> These are plain git config settings. Have a look at [1]. The attributes
> are explained in [2-3]. Basically you can set in .gitattributes
>
> *.bin -delta, -diff
>
> which would tell git that files with suffix bin should not be delta
> compressed and are always binary.
>
> You could also play around with turning compression completely off via
> core.compression or pack.compression.
>
> Hope that helps,
> Thomas
>
> PS: If you have resources to help fixing that long-standing bug in git
> for windows, there is a PR open [4] which has a WIP version. But beware
> you need good C skills and better-than-average git skills, or a
> Santa-Claus-style bag with monetary resources.
>
> [1]:
> https://git-scm.com/docs/git-config#Documentation/git-config.txt-corebigFileThreshold
> [2]: https://git-scm.com/docs/gitattributes#_code_delta_code
> [3]: https://git-scm.com/docs/gitattributes#_marking_files_as_binary
> [4]: https://github.com/git-for-windows/git/pull/2179
>
>> Am 07.12.2019 um 19:04 schrieb Thomas Braun:
>>> On 07.12.2019 17:54, Philip Oakley wrote:
>>>> Hi Andreas,
>>>>
>>>> On 06/12/2019 18:54, Andreas Kalz wrote:
>>>>> Hello,
>>>>> I am using git as archive and versioning also for photos. Apart from
>>>>> performance issues, I wanted to ask if there are hard limits and
>>>>> configurable limits (how to configure?) for maximum single file size
>>>>> and
>>>>> maximum .git archive size (Windows 64 Bit system)?
>>>>> Thanks in advance for your answer.
>>>>> All the best,
>>>>> Andreas
>>>> On Git the file size is currently limited to size of `long`, rather than
>>>> `size_t`. Hence on Git-for Windows the size limit is 32bit ~4GiB
>>>>
>>>> Any change will be a big change as it ripples through many places in the
>>>> code base and, for some, will feel 'wrong'. I did some work [1-4] on top
>>>> of those of many others that was almost there, but...
>>> Adding to what Philip said. On Windows the size of exported archives
>>> (git archive) is currently also limited to 4GB. The reason being also
>>> the long vs size_t issue (which is not present on linux though).
>>>
>>> So if you can switch to Linux or even MacOSX these issues are gone.
>>>
>>> The number of files in .git, only the number packfiles would be of
>>> interest here I guess, do not have the long vs size_t issue. So
>>> packfiles can be larger than 4GB on 64bit Windows (with 64bit git of
>>> course).
>>>
>>> But depending on how large the biggest files are, it might be worth
>>> tweaking some of the settings, so that the created packfiles are
>>> readable on all platforms. I once created a repo on linux which could
>>> not be checked on windows, and that is a bit annoying.
>>>
>>> So the questions are how large is each file? And what repository size do
>>> you expect? Are we talking about 20MB files and 10GB repository? Or a
>>> factor 100 more? And are you just adding files or are you modifying the
>>> added files? Depending on the file sizes it might then also be
>>> beneficial to tweak the delta compression settings and/or the big file
>>> threshold limits.
>>>
>>> Thomas
>>>
>>>> The alternative is git-lfs, which I don't personally use (see [4]).
>>>>
>>>> Philip
>>>>
>>>> [1] https://github.com/git-for-windows/git/pull/2179
>>>> [2] https://github.com/gitgitgadget/git/pull/115
>>>> [3] https://github.com/git-for-windows/git/issues/1063
>>>> [4] https://github.com/git-lfs/git-lfs/issues/2434
>>>>
>>>>
>>


      reply	other threads:[~2019-12-09 16:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 18:54 Git as data archive Andreas Kalz
2019-12-07 16:54 ` Philip Oakley
2019-12-07 18:04   ` Thomas Braun
2019-12-08 18:44     ` Andreas Kalz
2019-12-09  1:18       ` Thomas Braun
2019-12-09 16:39         ` Andreas Kalz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc38fbbe-e481-6a36-162e-34465f0a5bb0@gmx.de \
    --to=andreas-kalz@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=philipoakley@iee.email \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).