All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Binary files
       [not found] <CAFc9kS8L-JJoJqKi7bB90qwKVW8gB=EFk9D8c=4YShqnamwa2w@mail.gmail.com>
@ 2017-07-20  7:41 ` Volodymyr Sendetskyi
  2017-07-20  7:58   ` Bryan Turner
                     ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Volodymyr Sendetskyi @ 2017-07-20  7:41 UTC (permalink / raw)
  To: git

It is known, that git handles badly storing binary files in its
repositories at all.
This is especially about large files: even without any changes to
these files, their copies are snapshotted on each commit. So even
repositories with a small amount of code can grove very fast in size
if they contain some great binary files. Alongside this, the SVN is
much better about that, because it make changes to the server version
of file only if some changes were done.

So the question is: why not implementing some feature, that would
somehow handle this problem?

Of course, I don't know the internal git structure and the way of
working + some nuances (likely about the snapshots at all and the way
they are done), so handling this may be a great problem. But the
easiest feature for me as an end user will be something like
'.gitbinary', where I can list binary files, that would behave like on
SVN, or even more optimal, if you can implement it. Maybe there will
be a need for separate kinds of repositories, or even servers. But
that would be a great change and a logical way of next git's
evolution.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20  7:41 ` Binary files Volodymyr Sendetskyi
@ 2017-07-20  7:58   ` Bryan Turner
  2017-07-20  8:01   ` Konstantin Khomoutov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Bryan Turner @ 2017-07-20  7:58 UTC (permalink / raw)
  To: Volodymyr Sendetskyi; +Cc: Git Users

On Thu, Jul 20, 2017 at 12:41 AM, Volodymyr Sendetskyi
<volodymyrse@devcom.com> wrote:
> It is known, that git handles badly storing binary files in its
> repositories at all.
> This is especially about large files: even without any changes to
> these files, their copies are snapshotted on each commit. So even
> repositories with a small amount of code can grove very fast in size
> if they contain some great binary files. Alongside this, the SVN is
> much better about that, because it make changes to the server version
> of file only if some changes were done.
>
> So the question is: why not implementing some feature, that would
> somehow handle this problem?

Like Git LFS or git annex? Features have been implemented to better
handle large files; they're just not necessarily part of core Git.
Have you checked whether one of those solutions might work for your
use case?

Best regards,
Bryan Turner

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20  7:41 ` Binary files Volodymyr Sendetskyi
  2017-07-20  7:58   ` Bryan Turner
@ 2017-07-20  8:01   ` Konstantin Khomoutov
  2017-07-20  8:32   ` Lars Schneider
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Konstantin Khomoutov @ 2017-07-20  8:01 UTC (permalink / raw)
  To: Volodymyr Sendetskyi; +Cc: git

On Thu, Jul 20, 2017 at 10:41:48AM +0300, Volodymyr Sendetskyi wrote:

> It is known, that git handles badly storing binary files in its
> repositories at all.
[...]
> So the question is: why not implementing some feature, that would
> somehow handle this problem?
[...]

Have you examined git-lfs and git-annex?
(Actually, there are/were more solutions [1] but these two appear to be
the most used novadays.)

Such solutions allow one to use Git for what it does best and defer
handling of big files (or files for which lock-modify-unlock works better
than the usual modify-merge) to a specialized solution.

1. http://blog.deveo.com/storing-large-binary-files-in-git-repositories/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20  7:41 ` Binary files Volodymyr Sendetskyi
  2017-07-20  7:58   ` Bryan Turner
  2017-07-20  8:01   ` Konstantin Khomoutov
@ 2017-07-20  8:32   ` Lars Schneider
  2017-07-20 17:22   ` Stefan Beller
  2017-07-20 18:49   ` Igor Djordjevic
  4 siblings, 0 replies; 8+ messages in thread
From: Lars Schneider @ 2017-07-20  8:32 UTC (permalink / raw)
  To: Volodymyr Sendetskyi; +Cc: git


> On 20 Jul 2017, at 09:41, Volodymyr Sendetskyi <volodymyrse@devcom.com> wrote:
> 
> It is known, that git handles badly storing binary files in its
> repositories at all.
> This is especially about large files: even without any changes to
> these files, their copies are snapshotted on each commit. So even
> repositories with a small amount of code can grove very fast in size
> if they contain some great binary files. Alongside this, the SVN is
> much better about that, because it make changes to the server version
> of file only if some changes were done.
> 
> So the question is: why not implementing some feature, that would
> somehow handle this problem?
> 
> Of course, I don't know the internal git structure and the way of
> working + some nuances (likely about the snapshots at all and the way
> they are done), so handling this may be a great problem. But the
> easiest feature for me as an end user will be something like
> '.gitbinary', where I can list binary files, that would behave like on
> SVN, or even more optimal, if you can implement it. Maybe there will
> be a need for separate kinds of repositories, or even servers. But
> that would be a great change and a logical way of next git's
> evolution.

GitLFS [1] might be the workaround you want. There are efforts to bring 
large file support natively to Git [2].

I tried to explain GitLFS in more detail here: 
https://www.youtube.com/watch?v=YQzNfb4IwEY

- Lars


[1] https://git-lfs.github.com/
[2] https://public-inbox.org/git/20170620075523.26961-1-chriscool@tuxfamily.org/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20  7:41 ` Binary files Volodymyr Sendetskyi
                     ` (2 preceding siblings ...)
  2017-07-20  8:32   ` Lars Schneider
@ 2017-07-20 17:22   ` Stefan Beller
  2017-07-20 18:49   ` Igor Djordjevic
  4 siblings, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2017-07-20 17:22 UTC (permalink / raw)
  To: Volodymyr Sendetskyi; +Cc: git

On Thu, Jul 20, 2017 at 12:41 AM, Volodymyr Sendetskyi
<volodymyrse@devcom.com> wrote:
> It is known, that git handles badly storing binary files in its
> repositories at all.
> This is especially about large files: even without any changes to
> these files, their copies are snapshotted on each commit. So even
> repositories with a small amount of code can grove very fast in size
> if they contain some great binary files. Alongside this, the SVN is
> much better about that, because it make changes to the server version
> of file only if some changes were done.
>
> So the question is: why not implementing some feature, that would
> somehow handle this problem?

There are 'external' solutions such as git LFS and git-annex, mentioned
in replies nearby.

But note there are also efforts to handle large binary files internally
https://public-inbox.org/git/3420d9ae9ef86b78af1abe721891233e3f5865a2.1500508695.git.jonathantanmy@google.com/
https://public-inbox.org/git/20170713173459.3559-1-git@jeffhostetler.com/
https://public-inbox.org/git/20170620075523.26961-1-chriscool@tuxfamily.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20  7:41 ` Binary files Volodymyr Sendetskyi
                     ` (3 preceding siblings ...)
  2017-07-20 17:22   ` Stefan Beller
@ 2017-07-20 18:49   ` Igor Djordjevic
  2017-07-20 20:40     ` Junio C Hamano
  4 siblings, 1 reply; 8+ messages in thread
From: Igor Djordjevic @ 2017-07-20 18:49 UTC (permalink / raw)
  To: Volodymyr Sendetskyi, git

Hi Volodymyr,

On 20/07/2017 09:41, Volodymyr Sendetskyi wrote:
> It is known, that git handles badly storing binary files in its
> repositories at all.
> This is especially about large files: even without any changes to
> these files, their copies are snapshotted on each commit. So even
> repositories with a small amount of code can grove very fast in size
> if they contain some great binary files. Alongside this, the SVN is
> much better about that, because it make changes to the server version
> of file only if some changes were done.

You already got some proposals on what you could try for making large 
binary files handling easier, but I just wanted to comment on this 
part of your message, as it doesn`t seem to be correct.

Even though each repository file is included in each commit (being a 
full repository state snapshot), meaning big binary files as well, 
that`s just from an end-user`s perspective.

Actual implementation side is smarter than that - if file hasn`t 
changed between commits, it won`t get copied/written to Git object 
database again.

Under the hood, many different commits can point to the same 
(unchanged) file, thus repository size _does not_ grow very fast with 
each commit if large binary file is without any changes.

Usually, the biggest concern with Git and large files[1], in 
comparison to SVN, for example, is something else - Git model 
assuming each repository clone holding the complete repository 
history with all the different file versions included, so you can`t 
get just some of them, or the last snapshot only, keeping your local 
repository small in size.

If the repository you`re cloning from is a big one, your locally 
cloned repository will be as well, even if you may not really be 
interested in the big files at all... but you got some suggestions 
for handling that already, as pointed out :)

Just note that it`s not really Git vs SVN here, but more distributed 
vs centralized approach in general, as you can`t both have everything 
and yet skip something at the same time. Different systems may have 
different workarounds for a specific workflow, though.

[1] Besides taking each file version as a full-sized snapshot (at the 
beginning, at least, until the delta compression packing occurs).

Regards,
Buga

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20 18:49   ` Igor Djordjevic
@ 2017-07-20 20:40     ` Junio C Hamano
  2017-07-21 17:46       ` Igor Djordjevic
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-07-20 20:40 UTC (permalink / raw)
  To: Igor Djordjevic; +Cc: Volodymyr Sendetskyi, git

Igor Djordjevic <igor.d.djordjevic@gmail.com> writes:

> On 20/07/2017 09:41, Volodymyr Sendetskyi wrote:
>> It is known, that git handles badly storing binary files in its
>> repositories at all.
>> This is especially about large files: even without any changes to
>> these files, their copies are snapshotted on each commit. So even
>> repositories with a small amount of code can grove very fast in size
>> if they contain some great binary files. Alongside this, the SVN is
>> much better about that, because it make changes to the server version
>> of file only if some changes were done.
>
> You already got some proposals on what you could try for making large 
> binary files handling easier, but I just wanted to comment on this 
> part of your message, as it doesn`t seem to be correct.

All correct.  Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Binary files
  2017-07-20 20:40     ` Junio C Hamano
@ 2017-07-21 17:46       ` Igor Djordjevic
  0 siblings, 0 replies; 8+ messages in thread
From: Igor Djordjevic @ 2017-07-21 17:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Volodymyr Sendetskyi, git

On 20/07/2017 22:40, Junio C Hamano wrote:
> Igor Djordjevic <igor.d.djordjevic@gmail.com> writes:
>> On 20/07/2017 09:41, Volodymyr Sendetskyi wrote:
>>> It is known, that git handles badly storing binary files in its
>>> repositories at all.
>>> This is especially about large files: even without any changes to
>>> these files, their copies are snapshotted on each commit. So even
>>> repositories with a small amount of code can grove very fast in size
>>> if they contain some great binary files. Alongside this, the SVN is
>>> much better about that, because it make changes to the server version
>>> of file only if some changes were done.
>>
>> You already got some proposals on what you could try for making large 
>> binary files handling easier, but I just wanted to comment on this 
>> part of your message, as it doesn`t seem to be correct.
> 
> All correct.  Thanks.

No problem, thanks for confirmation, being relatively new around it`s 
appreciated, at least knowing that I got it correct myself :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-07-21 17:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAFc9kS8L-JJoJqKi7bB90qwKVW8gB=EFk9D8c=4YShqnamwa2w@mail.gmail.com>
2017-07-20  7:41 ` Binary files Volodymyr Sendetskyi
2017-07-20  7:58   ` Bryan Turner
2017-07-20  8:01   ` Konstantin Khomoutov
2017-07-20  8:32   ` Lars Schneider
2017-07-20 17:22   ` Stefan Beller
2017-07-20 18:49   ` Igor Djordjevic
2017-07-20 20:40     ` Junio C Hamano
2017-07-21 17:46       ` Igor Djordjevic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.