zstandard

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
  @ 2023-02-01 15:21 14%                 ` demerphq
  0 siblings, 0 replies; 12+ results
From: demerphq @ 2023-02-01 15:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Michal Suchánek, brian m. carlson, Konstantin Ryabitsev,
	Eli Schwartz, Git List

On Wed, 1 Feb 2023 at 14:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Wed, Feb 01 2023, demerphq wrote:
>
> > On Wed, 1 Feb 2023, 20:21 Michal Suchánek, <msuchanek@suse.de> wrote:
> >>
> >> On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote:
> >> > Why does it have to be gzip? It is not that hard to come up with a
> >
> >> historical reasons?
> >
> > Currently git doesn't advertise that archive creation is stable
> > right[1]? So I wrote that with the assumption that this new
> > compression would only be used when making a new archive with a
> > hypothetical new '--stable' option. So historical reasons don't come
> > up. Or was there some other form of history that you meant?
>
> We haven't advertised it, but people have come to rely on it, as the
> widespread breakages reported when upgrading to v2.38.0 at the start of
> this thread show.
>
> That's unfortunate, and those people probably shouldn't have done that,
> but that's water under the bridge. I think it would be irresponsible to
> change the output willy-nilly at this point, especially when it seems
> rather easy to find some compromise everyone will be happy with.
>
> > I'm just trying to point out here that stable compression is doable
> > and doesn't need to be as complex as specifying a stable gzip format.
> > I am not even saying git should just do this, just that it /could/ if
> > it decided that stability was important, and that doing so wouldn't
> > involve the complexity that Avar was implying would be needed.  Simple
> > compression like LZ variants are pretty straightforward to implement,
> > achieve pretty good compression and can run pretty fast.
> >
> > Yves
> > [1] if it did the issue kicking off this thread would not have
> > happened as there would be a test that would have noticed the change.
>
> I have some patches I'm about to submit to address issues in this
> thread, and it does add *a* test for archive output stability.
>
> But I'm not at all confident that it's exhaustive. I just found it by
> experiment, by locating tests ouf ours where the "git archive" output at
> the end is different with gzip and "git archive gzip".
>
> But is it guaranteed to find all potential cases where repository
> content might trigger different output with different gzip
> implementations? I don't know, but probably not.

BTW, I just happened to be looking at the zstd docs (I am updating
code that uses it), I saw this:

Zstandard's format is stable and documented in
[RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple
independent implementations are already available.
This repository represents the reference implementation, provided as
an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C**
library,
and a command line utility producing and decoding `.zst`, `.gz`, `.xz`
and `.lz4` files.
Should your project require another programming language,
a list of known ports and bindings is provided on [Zstandard
homepage](http://www.zstd.net/#other-languages).

So it sounds like that is a spec you could use. Not sure exactly what
they mean by "stable", but given the .gz compatibility maybe it would
be worth considering. Its a lot faster than zlib. (The library I
support includes Snappy, Zlib, and Zstd, and the latter is faster and
better than the other two.)

Yves
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply	[relevance 14%]